Dataset Health Overview

This page describes the Overview page, its components, and how to get the most out of it.

This page can be utilized to understand the status of a given dataset and its attributes at a given upload.

First, let's note the health KPIs this page displays:

KPIScope

Record count

Table

Number of records in a given upload

Total record count

Table

This KPI helps to track changes in the total size of the monitored table and is only available when CDC is enabled (i.e., the “delta only” flag is on). If the flag is off or the monitored source is not an SQL database, this field will be set to N/A.

Table Freshness

Table

Time between dataset upload-time and dataset update-time. Note: the source dataset metadata need to support update time

Record Freshness

Table

Record level freshness is defined by setting an expectation on a timestamp attribute within the data source (e.g.,. “Record Update Date” attribute to be no more than 1 month from now).If the timestamp attribute is not configured this KPI will be N/A.

Completeness

Attribute/Table

Percent of records where the attribute value is not null, not empty, or not one of the user-defined placeholders like N/A.

Correctness

Attribute

Percent of records where the attribute value meets all expectations, set for the attribute. Correctness is calculated only for attributes where Expectations are set, otherwise, the default is 100%.

Accuracy

Attribute

The ratio between business metrics dimensions where no drift was detected to the total number of dimensions. If multiple business metrics are defined for an attribute, then the minimum accuracy is picked for this attribute.

Uniqueness

Attribute

The ratio of records with unique id versus the total number of records.

Duplicates

Attribute

Count of duplicate values

Unique

Attribute

Count of unique values

Distinct

Attribute

Count of distinct values

Empty

Attribute

Count of empty values

Cardinality

Attribute

Values cardinality ration (Low, Med, High)

Note: for all attribute level metrics, table level metric is calculated as average across different columns.

Now to see these KPIs, let's go over the UI:

Selector Component

This component allows you to select:

  • Dataset

  • Attributes to calculate metrics on

  • Segments to filter on

  • Specific upload

Health KPIs Summary

This components provides a summary of calculated metrics on table and attribute scopes. Only the calculated

Attribute(s) KPIs

Finally for every attribute (or column) we are calculating the corresponding health KPIs

Last updated