Data Quality External Reporting

This page explains the Centralized DQ Monitor Scan Reporting feature, which generates a comprehensive, standardized report for every Data Quality (DQ) Monitor scan and appends the results to a designated external table. This mechanism centralizes DQ metrics from various sources and monitors, providing a unified, historical view of data quality performance.


Report Structure (External Destination Table Schema)

The external table is the central repository for all DQ scan results. It provides a standardized format that allows users to query and analyze DQ performance across all projects, data assets, and monitors.

Column Name

Data Type

Description

project_id

Long

Identifier for the project where the data asset resides.

data_asset_id

String

Identifier for the specific data asset (e.g., table/stream) being monitored.

monitor_Id

Long

Unique identifier for the Monitor that performed the check. This is the primary key for tracking the rule execution.

scan_timestamp

Timestamp

The exact time the DQ job completed execution.

total_records_failed

Integer

Count of records that failed the specific Monitor's check.

total_records_scanned

Integer

Total count of records processed by the job for this check.

record_id_attribute_name

String

The name of the primary key/ID attribute used to uniquely identify records in the data asset.

record_id_sample

Array of Strings

A sample of up to 100 failed record IDs. This sample aids in immediate investigation and debugging.


User Actionability

  • To check DQ status: Users should query the External Destination Table, filtering by data_asset_id and scan_timestamp.

  • To debug failures: Users can use the record_id_sample along with the record_id_attribute_name to look up the failing records directly in the source data asset for diagnosis.


Configuring Reporting

Reporting can only be configured via APIs. Please refer to DQ Reporting APIs.

Last updated