Introduction Indicators of Data Quality
This chapter focuses on six key data quality indicators, completeness, uniqueness, timeliness, validity, accuracy and consistency.
Data Quality Indicator
In the recent past, challenges around ensuring acceptable data quality has escalated with the rise of automated systems and complex data pipelines that record and process growing volumes, variety, speed and volatility of data.
The traditional tools for data quality were designed for architecture that no longer exists.
This is leading to increased manual labor or sub-optimal solutions that have negative impact on productivity and outcomes of dataOps teams.
Early efforts to actively improve data quality is the problem for all data teams. How do we specify data quality and how do we measure it? Basic measurements are normally done on what has been referred to in the literature as Data Quality Dimensions.
Many dimensions were introduced over the years; among which are: Accuracy, Validity, Completeness, Consistency, Reliability, Timeliness, Uniqueness, Accessibility, Confidentiality, Relevance, Integrity, … etc. However, there is no standardization for their names or descriptions.
A comprehensive survey of over 60 dimensions is conducted by DAMA NL Foundations and published in DDQ-Research-2020 in an attempt to move towards more standardization. Among the many dimensions of data quality, an industry usually chooses a small subset of the most critical ones for its needs.
These are referred to as the primary, or key, dimensions. Although the primary dimensions are not universally agreed on, even amongst data quality professionals, there are six dimensions that are widely accepted as primary dimensions.
These are Completeness, Validity, Accuracy, Consistency, Uniqueness, and Timeliness.
The focus of this Chapter is on these six widely used dimensions, and their measurements, which we refer to as data quality indicators (DQI).
Copy link