Data Binning

Keep bad data from spreading by automating the process of separating good data from bad data

Data Binning is a feature were Telmai monitors your data correctness and splits your data into good and bad. Good data can continue to be used within your pipeline, but bad or suspicious data can be reviewed and accessed. This can help you make sure only good (or expected) data is flowing into your ecosystem.

To enable this feature, you will need to:

  1. Connect a data source

  2. Configure ID attribute for the data source

  3. Set data expectation rules in Business Rules page

  4. Under Alert Policy page, you will need to create a correctness policy. This policy will only be used for scoping

  5. You are now able to set your Data Binning policy

Once you click Enable Data Binning, a prompt will ask you to define the policy details:

  1. Specify the correctness policy

  2. Bucket Type (AWS-S3, GCP-Storage or Azure-Blob)

    • Once selected, you will need to enter the credentials

  3. You will then need to define:

    • Valid Data Path: Path for good data (correct data)

    • Invalid Data Path: Path for bad data (incorrect data)

This binning will automatically take effect in your next run.

Data binning only happens when one or more attributes have incorrect values. Otherwise, the policy is not applied.

Last updated