Configuring Data-Binning
Data binning enables you to automatically separate valid data from problematic data at the source, ensuring that only high-quality records flow through your data pipelines. By routing bad or suspicious data to a separate location for review, you can prevent costly downstream processing of invalid records and maintain pipeline efficiency.
Telmai's data binning feature monitors data correctness and categorizes records into two bins:
Valid Data: Records that meet all defined data quality rules and continue through your pipeline
Invalid Data: Records that violate quality rules and are flagged for review
This automated quality gate helps you maintain data integrity while reducing the cost of processing bad data through expensive transformation and analytics pipelines.
Prerequisites
Before configuring data binning, ensure you have:
A connected data source in Telmai
Defined record validation rules for your dataset
A cloud storage destination (AWS S3, GCP Cloud Storage, or Azure Blob Storage) with appropriate permissions
Configuration Steps
Step 1: Configure ID Attribute
Set the ID attribute for your data source to enable record-level tracking. This unique identifier allows Telmai to categorize individual records into valid or invalid bins.
To configure:
Navigate to your connected asset settings
Locate the ID Attribute configuration
Select the column that uniquely identifies each record (e.g.,
record_id,transaction_id,user_id)
Step 2: Define Data Quality Monitors
Create one or more validation monitors that determine whether data is valid or invalid as defined in Record Validation Monitors.
Step 3: Enable Data Binning
Now you're ready to configure and activate data binning.
To enable:
Click the Configure Data Binning button

Select Correctness Monitor: Choose one or more correctness monitors you created in Step 2. These are the monitors/rules that defines which records are considered valid vs. invalid.
Choose Storage Destination: Select your cloud storage type:
AWS S3
GCP Cloud Storage
Azure Blob Storage
Enter Storage Credentials: Provide the authentication credentials for your chosen storage destination. See Storage Permissions below for required access levels.
Define Storage Paths
Valid Data Path: The destination path for records that pass all quality rules (e.g.,
s3://my-bucket/valid-data/)Invalid Data Path: The destination path for records that fail quality rules (e.g.,
s3://my-bucket/invalid-data/)
Click Save to activate data binning
Step 4: Verify Binning Activation
Once enabled, data binning automatically takes effect during your next data scan job. Telmai will begin routing records to the appropriate storage paths based on your correctness policy.
To verify:
Wait for the next scheduled scan or trigger a manual scan
Check both the valid and invalid data paths in your cloud storage
Review the binning results in the Telmai monitoring dashboard
Storage Permissions
AWS S3
Ensure your AWS credentials have the following permissions:
s3:PutObject- Write objects to bucketss3:DeleteObject- Remove objects from bucketss3:ListBucket- List bucket contents
GCP Cloud Storage
Your service account must have the following IAM roles:
Storage Legacy Bucket Writer - Write access to buckets
Storage Legacy Object Owner - Full control over objects
Azure Blob Storage
Your SAS token or service principal must have:
Write permissions - Create and write blobs
Delete permissions - Remove blobs (required for data binning operations)
Feature Limitation
Group Assets: Data binning is not currently supported for assets belonging to a group. Configure binning at the individual asset level.
Data Diff Compatibility: The Data Diff feature is not available for assets with data binning enabled.
Scan Frequency: Binning operates during scheduled scans. Ensure your scan frequency aligns with your data freshness requirements.
Related Documentation
Last updated