# Completeness

Completeness is defined as not-null values but often can go beyond not-null values. When creating a completeness score, we recommend reviewing the following ways to measure it.

Data is populated at a specified level (refer to below); this can be calculated using simple null checks at your datastore level. Data is populated with values that exclude the template/null proxies. Examples of null proxies : 000-000-000, N/A, none, not-defined, Not-applicable , NA

The standard unit of Measure: percentage(%) Examples Example(s): Customer table of 3M customers has 2.94M populated emails.&#x20;

Then completeness of Emails is 2.94/3 x 100 = 98%&#x20;

**Industry sample(Uber Engineering) : Completeness**

| Prerequisite                                        | Formula                                                                        |
| --------------------------------------------------- | ------------------------------------------------------------------------------ |
| Upstream and downstream datasets are 1:1 mapped     | *downstream\_row\_count / upstream\_row\_count > completenessSLA*              |
| Offline job to ingest sampled upstream data to Hive | *sampled\_row\_count\_in\_hive / total\_sampled\_row\_count > completenessSLA* |

## Different levels of measurement

**Dataset/Database:** Total data present across all the tables in the entire dataset or database. Expected values are usually calculated against historical data or against other data sources in the pipeline.

**Table/schema :** Total data populated in specific tables like Customer, Order, or transactions. Expected values are usually calculated against historical data or against other data sources in the pipeline.

![Incomplete data compared between data storage](/files/YbYD6XtxbLNcZw1i2aNM)

**Metadata:** The degree to which both technical and business metadata information is populate&#x64;**.** This is usually measured as not-null values for metadata like descriptions, security tags, creation date etc in your data catalog.&#x20;

**Record :** Total number records in the dataset. Expected values can be measured against historic trends or lineage across various data sources in the pipeline.&#x20;

For example, based on historical trends users expect 1M records but receive only 500k causing  incomplete records. Or  BigQuery has 1M records whereas Cassandra  has only 800k records, indicating the records are incomplete.

**Attributes:** Total number of attributes in a schema definition.Expected value can be specified by business requirements. This is a useful indicator to ensure schema is complete across multiple data sources in your data pipeline and can be tracked through lineage of schema.

For Example, Customer table in BigQuery has 18 attributes, whereas Cassandra  has only 16 attributes, often classified as schema mismatch too.

Value: Total numbers of populated attribute values across all rows. For example, if a table has 10 rows with 5 attributes, this will indicates how many values are populated across these 50 value&#x73;**.**

\
**An attribute value:** Total numbers of populated attribute values for a specific attribute.

For example, if a table has 10 rows with 5 attributes one such attribute is email, this will indicate how many out of the 10 records have email’s populated.

<br>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.telm.ai/academy/data-quality-indicators/completeness.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
