Data Quality Rules
Last updated
Last updated
Users can define expectations for attribute values, such as ensuring values in a column fall within a specific range (e.g., between 1 and 100). If an attribute value does not meet the defined expectation, it is marked as incorrect and tracked as part of the Correctness KPI.To define expectations on a given attribute, you will need to:
Defining Expectations
To set expectations for an attribute:
Navigate to the “Correctness Rules” page.
Select the Dataset: Choose the dataset for which you want to add expectations.
Add an Attribute: Click the + button to select the attribute you want to set expectations for.
Add Expectations:
Click the + button under the selected attribute to define expectations.
You’ll be prompted to specify:
Expectation Type
Expectation Metadata
Repeat Steps 3 & 4 as needed.
Telmai will begin monitoring these expectations as part of the Correctness metric in future scans.
Here is the list of possible expectation rules you can define:
Some rules may require combining values from multiple attributes. Telmai offers a Compound Attributes feature, which allows you to define an SQL-like expression, validate it in real-time, inspect outputs, and define expectations, similar to single-attribute rules.
Navigate to the Configuration page and select the corresponding dataset.
Click the "Add Compound Attribute" button
Define the Attribute:
Attribute ID: Name the compound attribute.
Attributes to Compound: List the attributes to be used.
Expression: Enter an SQL-like expression that defines the transformation.
Enter an expression. Please see below for additional information on supported functions and examples of expressions
You can validate the expression output by entering values for each corresponding attribute. When clicking on the edit field, Telmai will offer a few samples taken from the actual data to help
Once the attribute is created, you can add an expectation, as shown in the section above for the single attribute.
Important Usage Notes:
As the user inputs an expression, an evaluation is done to ensure a valid SQL expression
There are some cases where the validation will pass but will fail during executions due to casting-related errors. The values should be cast appropriately to the expected type to mitigate this. The expression output must always be a string type. In case it is not, users need to cast to a string explicitly. Example:
If the expression has contains function; the output is boolean contains( last_name , 'Smith')
The user needs to cast the value string cast ( contains( last_name , 'Smith') as string)
All parameters are treated as strings. The user needs to cast the parameters to the corresponding function param type. Example:
If the user wants to use the date_add function, which takes in number and a date (which is a string) The user will need to cast my_number
first like this: date_add('2016-07-30', cast (my_number as int))
Examples
Example 1:
The output expression is boolean, so it needs to be converted to string in order to become a value of a new compound attribute:
Example 2:
Convert boolean to string values:
Example 3:
price is an attribute, so it’s value comes as a string, so it needs to be first cast to double
Rule
Description
Example
Accepted Compressed Patterns
List of allowed compressed patterns
“LD”
Accepted Expanded Patterns
List of allowed expanded patterns
“LLLDDD”
Accepted Values
List of allowed values
“Aa” “Bb”
“Cc”
Date Time Range
Allowed time range
From: 1d - To: 1M From: 1d - To: @now
Empty Values
(note this specific case is handled for completeness)
Values to be treated as empty
“_EMPTY_”
“null”
Frequency
Frequency is how many times a distinct value is expected to occur
From: 0 - To: INF
Is Date
Value is ISO Date (yyyy-mm-dd)
True/False
Is DateTime
Value is ISO Datetime (yyyy-mm-ddTHH:MM:SSZ)
True/False
Length
String length
From: 5 - To: 5
Numeric Value Range
Value range
From: 0 - To: 100
Regex
Regular expression
“^helloWorld\d”
Rejected Compressed Patterns
List of not allowable compressed patterns
“LD”
Rejected Expanded Patterns
List of not allowable expanded patterns
“LLLDDD”
Rejected Values
Not allowable values
“Aa” “Bb”
“Cc”
Spaces Count
Count number of spaces
From 0 - To: 10
Special Characters Count
Count of special characters in the value
From: 0 - To: 10
Tokens Count
Count of tokens in the value, splitted by any number of spaces or punctuations
From: 0 - To: 10
Trimmed String
If the string has no leading and trailing spaces
True/False
Value Is Alpha
Boolean check for alphabetical letter only
True/False
Value Is Digit
Boolean check for single digit only
True/False
Value Is Email
Boolean check for valid email string format
True/False
Value Is Number
Boolean check for numerical value
True/False
Value Is URL
Boolean check for valid URL string format
True/False
Words Count
Number of words, i.e split by whitespace characters
From 5 - To 10