Data Quality Metrics
Telmai enables users to monitor a wide range of data quality metrics and receive alerts when anomalies or issues are detected. Alerts serve as warnings that something may not be as expected. Some alerts can be configured to trigger notifications, while others are displayed in the Telmai UI for informational purposes. Additionally, some alerts may initiate remediation actions, such as segregating good data from bad, triggering a circuit breaker for the pipeline, and more.
What is monitored?
Out-of-the-Box Metrics: Telmai automatically monitors a variety of pre-defined metrics related to table metadata and Health KPIs(mentioned in Data Health KPIs)
Custom Metrics: Users can define their own metrics to monitor specific data quality concerns.
Each monitored metric is validated against a set of policies, which define thresholds, scope, notification settings, and more. If a metric violates a policy, an alert is generated. The flow diagram below illustrates how alerts are created.

Out-of-The-Box Metrics
Table Level Freshness: Time elapsed since the last change in the monitored table at the time of the scan
Record Count: Number of records being scanned
Total Table Records Count: Total number of records in the monitored table. This may be larger than the record count if the scan involves only a subset of data, such as when delta detection is configured
Correctness: Percentage of records that meet defined data quality rules
Completeness: Percentage of records with non-null/non-empty values
Record ID Uniqueness: Percentage of unique records based on the configured ID attribute.
Uniqueness: Percentage of unique values within an attribute
User Defined Metrics
Users can create custom metrics to track specific anomalies. Telmai supports two types of custom metrics; Expression and SQL
Metric as an Expression
This method enables user to specify a simple aggregation, and grouping; example: SUM(`sales`) group by `region`, `country`
Attribute names must be wrapped in backticks
`
Maximum number of group by dimension is 4
Metrics from Raw SQL
Alternatively, users can write a raw SQL query that returns a single metric, and multiple dimensions. The first returned column must be a numeric value. This numeric value is the tracked metrics. Following columns are used as dimensions.
Applicable only for data connectors - BigQuery, Athena, Databricks, Trino, Snowflake, RedShift
Table name must be wrapped in backticks
`
Use valid SQL syntax
Ensure the first selected column returns a numeric value
Example:
SELECT Emp_Salary, Emp_Region FROM `employee_table` WHERE Emp_Age > 60
Emp_Salary
is the tracked metricEmp_Region
is the used dimension
To add a new custom metric:
Select the Dataset: Choose the dataset you want to monitor
Navigate to “Alerting Policies”: Go to the “Metrics” tab
Add a Custom Metric:
Click the “+ Custom Metric” button
A new window will open, allowing you to define the metric
Select Type
SQL
for raw sql syntaxExpression
for aggregation expression
Define the Metric:
Name: Enter a name for the metric
Description: Provide a brief explanation of the metric
Click Validate and Save to save the metric
Telmai will start monitoring this metric in future scans
Supported aggregations for Expressions
Below is the list of supported functions
min
max
count
avg
sum
distinct
variance
median
stddev
Aggregation Examples
// Count distinct age for different cities
count(distinct(`Age`)) group by `City`
// Sum of salaries over total count
sum(`salaries`)/count(*)
// Average age for each school and district
avg(`age`) group by `school`, `district`
SQL Examples
// Fetches salary and names of employees earning over 2000.
SELECT Emp_Salary, FirstName, LastName FROM `EmployeeTable` WHERE Emp_Salary > 2000;
//Counts records for each specialty and city combination from the json_3r_a table.
SELECT count(*),Primary_Specialty, Address.City From `json_3r_a` GROUP BY Primary_Specialty, Address.City
// Categorizes students by age group and lists their details, sorted by group.
SELECT CASE WHEN Age < 25 THEN 1 WHEN Age BETWEEN 25 AND 30 THEN 2 ELSE 3 END AS age_group, Name, Age, City FROM `students` ORDER BY age_group
Last updated