Introduction Indicators of Data Quality
This chapter focuses on six key data quality indicators, completeness, uniqueness, timeliness, validity, accuracy and consistency.
Data Quality Indicator
Recently, challenges around ensuring acceptable data quality have escalated with the rise of automated systems and complex data pipelines that record and process growing volumes, variety, speed, and volatility of data.
The traditional tools for data quality are designed for architecture that no longer exists.
This leads to increased manual labor or sub-optimal solutions that negatively impact the productivity and outcomes of dataOps teams.
Actively improving data quality is a problem for all data teams. How do we specify data quality, and how do we measure it? Typically basic measurements are done on what has been referred to in the literature as Data Quality Dimensions or Indicators.
Historically many Data Quality dimensions have been adopted like: Accuracy, Validity, Completeness, Consistency, Reliability, Timeliness, Uniqueness, Accessibility, Confidentiality, Relevance, Integrity, … etc.
However, there is no standardization for their names or descriptions.
A comprehensive survey of over 60 dimensions was conducted by DAMA NL Foundations and published in DDQ-Research-2020 as an attempt to move towards more standardization. Among the many dimensions of data quality, customers typically choose a small subset of the most critical ones for their needs.
These are referred to as the primary or critical dimensions. Although the primary dimensions are not universally agreed on, even amongst data quality professionals, six dimensions are widely accepted as primary dimensions.
Completeness, Validity, Accuracy, Consistency, Uniqueness, and Timeliness.
This chapter focuses on these six widely used dimensions and their measurements, which we refer to as data quality indicators (DQI).