Data Quality Rules

Users can define expectations for attribute values, such as ensuring values in a column fall within a specific range (e.g., between 1 and 100). If an attribute value does not meet the defined expectation, it is marked as incorrect and tracked as part of the Correctness KPI.To define expectations on a given attribute, you will need to:

Defining Expectations

To set expectations for an attribute:

  1. Navigate to the “Correctness Rules” page.

  2. Select the Dataset: Choose the dataset for which you want to add expectations.

  3. Add an Attribute: Click the + button to select the attribute you want to set expectations for.

  4. Add Expectations:

    • Click the + button under the selected attribute to define expectations.

    • You’ll be prompted to specify:

      • Expectation Type

      • Expectation Metadata

  5. Repeat Steps 3 & 4 as needed.

Telmai will begin monitoring these expectations as part of the Correctness metric in future scans.

Here is the list of possible expectation rules you can define:

Rule

Description

Example

Accepted Compressed Patterns

List of allowed compressed patterns

“LD”

Accepted Expanded Patterns

List of allowed expanded patterns

“LLLDDD”

Accepted Values

List of allowed values

“Aa” “Bb”

“Cc”

Date Time Range

Allowed time range

From: 1d - To: 1M From: 1d - To: @now

Empty Values

(note this specific case is handled for completeness)

Values to be treated as empty

“_EMPTY_”

“null”

Frequency

Frequency is how many times a distinct value is expected to occur

From: 0 - To: INF

Is Date

Value is ISO Date (yyyy-mm-dd)

True/False

Is DateTime

Value is ISO Datetime (yyyy-mm-ddTHH:MM:SSZ)

True/False

Length

String length

From: 5 - To: 5

Numeric Value Range

Value range

From: 0 - To: 100

Regex

Regular expression

“^helloWorld\d”

Rejected Compressed Patterns

List of not allowable compressed patterns

“LD”

Rejected Expanded Patterns

List of not allowable expanded patterns

“LLLDDD”

Rejected Values

Not allowable values

“Aa” “Bb”

“Cc”

Spaces Count

Count number of spaces

From 0 - To: 10

Special Characters Count

Count of special characters in the value

From: 0 - To: 10

Tokens Count

Count of tokens in the value, splitted by any number of spaces or punctuations

From: 0 - To: 10

Trimmed String

If the string has no leading and trailing spaces

True/False

Value Is Alpha

Boolean check for alphabetical letter only

True/False

Value Is Digit

Boolean check for single digit only

True/False

Value Is Email

Boolean check for valid email string format

True/False

Value Is Number

Boolean check for numerical value

True/False

Value Is URL

Boolean check for valid URL string format

True/False

Words Count

Number of words, i.e split by whitespace characters

From 5 - To 10

Multi-Attribute Rules

Some rules may require combining values from multiple attributes. Telmai offers a Compound Attributes feature, which allows you to define an SQL-like expression, validate it in real-time, inspect outputs, and define expectations, similar to single-attribute rules.

Adding a Compound Attribute

  1. Navigate to the Configuration page and select the corresponding dataset.

  2. Click the "Add Compound Attribute" button

  3. Define the Attribute:

    • Attribute ID: Name the compound attribute.

    • Attributes to Compound: List the attributes to be used.

    • Expression: Enter an SQL-like expression that defines the transformation.

  4. Enter an expression. Please see below for additional information on supported functions and examples of expressions

You can validate the expression output by entering values for each corresponding attribute. When clicking on the edit field, Telmai will offer a few samples taken from the actual data to help

Once the attribute is created, you can add an expectation, as shown in the section above for the single attribute.

Important Usage Notes:

  • As the user inputs an expression, an evaluation is done to ensure a valid SQL expression

  • There are some cases where the validation will pass but will fail during executions due to casting-related errors. The values should be cast appropriately to the expected type to mitigate this. The expression output must always be a string type. In case it is not, users need to cast to a string explicitly. Example:

    • If the expression has contains function; the output is boolean contains( last_name , 'Smith')

      The user needs to cast the value string cast ( contains( last_name , 'Smith') as string)

  • All parameters are treated as strings. The user needs to cast the parameters to the corresponding function param type. Example:

    • If the user wants to use the date_add function, which takes in number and a date (which is a string) The user will need to cast my_number first like this: date_add('2016-07-30', cast (my_number as int))

Examples

Example 1:

LENGTH(CONCAT_WS(\",\", firstname, lastname)) > 100

The output expression is boolean, so it needs to be converted to string in order to become a value of a new compound attribute:

CAST(LENGTH(CONCAT_WS(\",\", firstname, lastname)) > 100 as STRING)

Example 2:

IF( DATEDIFF(current_timestamp(), orderdate) > 10 , true , false)

Convert boolean to string values:

IF( DATEDIFF(current_timestamp(), orderdate) > 10 , 'true' , 'false' )

Example 3:

IF( ISNOTNULL(country) and ISNOTNULL(city) and (price <= 100)), 'true', 'false')

price is an attribute, so it’s value comes as a string, so it needs to be first cast to double

IF( ISNOTNULL(country) and ISNOTNULL(city) and (CAST(price as DOUBLE) <= 100)),'true', 'false')

Last updated