Telmai Academy
  • Data Quality and Observability Academy
  • Basics of Data Observability
  • Data Quality Indicators
    • Introduction - Indicators of Data Quality
    • Selecting Data Quality Indicators
    • Completeness
    • Uniqueness
    • Freshness
    • Validity
    • Accuracy
    • Consistency
    • Data Lineage
  • Advanced Topic: Implementing DQ indicators
    • Completeness
      • Built-in
      • User-Defined
  • Correctness
    • Categorical (Nominal or Ordinal)
    • Numerical (Discrete or Continuous)
    • Structured
    • Semi-Structured
    • Unstructured
    • Uncommon Types
    • Designated Values
  • Profiling data
    • Basics of profiling
    • Interactive Profiling
  • Monitoring data quality
  • Monitoring definitions
    • SLO
    • SLI
    • Policies
    • Setting up policies and alerting
  • Monitoring Sources
Powered by GitBook
On this page
  • Measuring Uniqueness
  • Examples

Was this helpful?

  1. Data Quality Indicators

Uniqueness

PreviousCompletenessNextFreshness

Last updated 3 years ago

Was this helpful?

The number of records that can be identified uniquely based on a predefined key.

Good to know: Uniqueness is the inverse of duplicates.

Measuring Uniqueness

Prerequisite for this is that primary key defined

Formula : 1 – primary_key_count / total_row_count

Common unit of Measure: Value count or percentage

Examples

Three different problems uniqueness could have occurred:

One record with one key value occurs more than once in a dataset (duplicate with identical key values). The two records are not unique.

Key | Student Name

22 | John snow

22 | John snow

Often times Datastore constraints can easily help avoid this issue.

Multiple records with same values occur more than once in a dataset (duplicate with different key values). Object John is not unique in the dataset.

Key | Student Name

22 | John snow

37 | John snow

A record has the same key as another record, and both occur in a dataset (false duplicate). Key 22 is not unique.

Key | Student Name

22 | John snow

37 | John snow

Most often users will need use sophisticated master data management and Identify resolution systems for resolving duplicates like these.

Related dimension: Consistency