LogoLogo
HOMEBLOG
  • Getting Started
  • Connect to Data
    • Projects
    • Data Connectors
      • Google BigQuery
      • Google Cloud Storage
      • Iceberg
      • Snowflake
      • AWS S3
      • AWS Athena
      • AWS Redshift
      • Databricks Delta
      • Azure Blob
      • Salesforce
      • SAP Hana
      • File Path Options
      • SQL Server
      • Trino
    • Connection Modes
    • Triggering Scans
    • Configuring a Data Source
  • Profiling Data
    • Data Health Metrics
    • Data Health Overview Page
    • Interactive Profiling Tool: Investigator
    • Data Diff
    • Compound Attributes
      • List of Supported Functions
  • Monitoring Data
    • Data Quality Metrics
    • Alert Policies
    • Data Trends and Alerts
    • Metrics Inspector
  • Data Quality Rules
    • Rules Expression Examples
  • PII Data Detection
  • Remediation
    • Data Binning
    • Circuit Breaker
  • Integrations
    • Jira Integration
    • Slack
    • Jobs Status Notification
  • User Management
    • Okta SSO Setup
    • SSO Configuration
  • API Reference
    • Authentication API
    • API Keys
    • Telmai IP List
    • Get Google Service Account API
  • Source APIs
    • Source APIs
  • Upload Data APIs
    • Upload data from Cloud
      • RedShift Request data
      • GCS Request data
      • Azure Request data
      • GBQ Request data
      • Snowflake Request data
      • Amazon S3 Request data
      • Delta Lake Request
      • Trino Request data
    • Track upload job
    • Check for alerts
  • Admin APIs
    • User Management
  • Telmai Releases
    • Release Notes
      • 25.2.0
      • 25.1.3
      • 25.1.2
      • 25.1.0
Powered by GitBook
On this page
  1. Profiling Data

Data Diff

PreviousInteractive Profiling Tool: InvestigatorNextCompound Attributes

Last updated 8 months ago

Telmai allows you to compare the differences between your datasets. This feature is useful when you want to ensure data consistency across different tables, and can be used in data reconciliation or migration cases.

As part of scanning a dataset, Telmai checks for the differences between source and target tables (if configured). In the app, Telmai will provide summary stats on the number of new, missing and changed records. Outside the app, Telmai will store the changed or different records for further usage or analysis.

Telmai’s Data Diff feature runs as part of tables’ regular scan. You will first need to connect the two datasets, you want to compare, to Telmai using the steps defined . Then, you will need to update the configs for the table you want to compare (target table) using these steps:

  1. Define the ID Attribute

  2. From the 3 dot menu, click “Data Comparison” option

  3. You will be prompted to fill details:

    1. Source table: Dataset you want to compare to

    2. Result Destination: Output for parquet files (S3, Azure Blob, or GCP storage)

  4. Once the details is selected, you will be prompted to fill more details on associated bucket

  5. Next scan will analyze the deltas between both datasets, and alerts will be created if deltas exist

Data Diff Alert Example

If any differences is detected across the source and target datasets, a “Data Difference” alert is created similar to below picture:

Clicking on the alert, will show more details on changed schema and records similar to picture below:

Lastly, navigating to the output parquet files, you can see more details on changed records, such as individual changes per each record.

here