Data Binning
Keep bad data from spreading by automating the process of separating good data from bad data
Data Binning is a feature were Telmai monitors your data correctness and splits your data into good and bad. Good data can continue to be used within your pipeline, but bad or suspicious data can be reviewed and accessed. This can help you make sure only good (or expected) data is flowing into your ecosystem.
To enable this feature, you will need to:
Connect a data source
Configure
ID
attribute for the data sourceSet data expectation rules in
Business Rules
pageUnder
Alert Policy
page, you will need to create a correctness policy. This policy will only be used for scopingYou are now able to set your Data Binning policy
Once you click Enable Data Binning
, a prompt will ask you to define the policy details:
Specify the correctness policy
Bucket Type (AWS-S3, GCP-Storage or Azure-Blob)
Once selected, you will need to enter the credentials
You will then need to define:
Valid Data Path:
Path for good data (correct data)Invalid Data Path:
Path for bad data (incorrect data)
This binning will automatically take effect in your next run.
Data binning only happens when one or more attributes have incorrect values. Otherwise, the policy is not applied.
Last updated