LogoLogo
HOMEBLOG
  • Getting Started
  • Connect to Data
    • Projects
    • Data Connectors
      • Google BigQuery
      • Google Cloud Storage
      • Iceberg
      • Snowflake
      • AWS S3
      • AWS Athena
      • AWS Redshift
      • Databricks Delta
      • Azure Blob
      • Salesforce
      • SAP Hana
      • File Path Options
      • SQL Server
      • Trino
    • Connection Modes
    • Triggering Scans
    • Configuring a Data Source
  • Profiling Data
    • Data Health Metrics
    • Data Health Overview Page
    • Interactive Profiling Tool: Investigator
    • Data Diff
    • Compound Attributes
      • List of Supported Functions
  • Monitoring Data
    • Data Quality Metrics
    • Alert Policies
    • Data Trends and Alerts
    • Metrics Inspector
  • Data Quality Rules
    • Rules Expression Examples
  • PII Data Detection
  • Remediation
    • Data Binning
    • Circuit Breaker
  • Integrations
    • Jira Integration
    • Slack
    • Jobs Status Notification
  • User Management
    • Okta SSO Setup
    • SSO Configuration
  • API Reference
    • Authentication API
    • API Keys
    • Telmai IP List
    • Get Google Service Account API
  • Source APIs
    • Source APIs
  • Upload Data APIs
    • Upload data from Cloud
      • RedShift Request data
      • GCS Request data
      • Azure Request data
      • GBQ Request data
      • Snowflake Request data
      • Amazon S3 Request data
      • Delta Lake Request
      • Trino Request data
    • Track upload job
    • Check for alerts
  • Admin APIs
    • User Management
  • Telmai Releases
    • Release Notes
      • 25.2.0
      • 25.1.3
      • 25.1.2
      • 25.1.0
Powered by GitBook
On this page
  1. Remediation

Data Binning

PreviousRemediationNextCircuit Breaker

Last updated 3 days ago

Data Binning Policy

Telmai offers a data binning feature to help manage data quality issues at the source. This feature allows you to define a policy where Telmai monitors data correctness and categorizes your data into "good" and "bad" bins. Good data continues through your pipeline, while bad or suspicious data is flagged for review

This feature can help you ensure that only good (or expected) data is flowing into your ecosystem. By doing so, you can ensure that your costly pipelines are only running on healthy datasets

To enable this feature for a connected data source, you will need to:

  1. Configure the ID Attribute: Set the ID attribute for the data source

  2. Set Data Expectation Rules: Define rules on the "Data Quality Rules" page

  3. Navigate to the "Alerts & Trends" page and create a correctness policy. This policy will be used for scoping

    1. You are now able to set your Data Binning policy

  4. Click , a prompt will ask you to define the policy details:

  1. Select previously created correctness policy

  2. Pick desired bucket type (AWS-S3, GCP-Storage or Azure-Blob)

    1. Once selected, you will need to enter the credentials

  3. You will then need to define:

    1. Valid Data Path: Path for good data (correct data)

    2. Invalid Data Path: Path for bad data (incorrect data)

Once enabled, the binning will automatically take effect in your next data scan job.

Note: Data Binning to Azure-storage, the SAS key used must have delete permissions.