Data Binning
Last updated
Last updated
Telmai offers a data binning feature to help manage data quality issues at the source. This feature allows you to define a policy where Telmai monitors data correctness and categorizes your data into "good" and "bad" bins. Good data continues through your pipeline, while bad or suspicious data is flagged for review
This feature can help you ensure that only good (or expected) data is flowing into your ecosystem. By doing so, you can ensure that your costly pipelines are only running on healthy datasets
To enable this feature for a connected data source, you will need to:
Configure the ID Attribute: Set the ID attribute for the data source
Set Data Expectation Rules: Define rules on the Correctness Rules page
Navigate to the Alert page and create a correctness policy. This policy will be used for scoping
You are now able to set your Data Binning policy
Click , a prompt will ask you to define the policy details:
Select previously created correctness policy
Pick desired bucket type (AWS-S3, GCP-Storage or Azure-Blob)
Once selected, you will need to enter the credentials
You will then need to define:
Valid Data Path: Path for good data (correct data)
Invalid Data Path: Path for bad data (incorrect data)
This binning will automatically take effect in your next data processing job.
Note: Data binning only happens when one or more attributes have incorrect values. Otherwise, the policy is not applied.