Data Quality External Reporting

This page explains the Centralized DQ Monitor Scan Reporting feature, which generates a comprehensive, standardized report for every Data Quality (DQ) Monitor scan and appends the results to a designated external table. This mechanism centralizes DQ metrics from various sources and monitors, providing a unified, historical view of data quality performance.


Report Structure (External Destination Table Schema)

The external table is the central repository for all DQ scan results. It provides a standardized format that allows users to query and analyze DQ performance across all projects, data assets, and monitors.

Column Name
Data Type
Description

project_id

Long

Identifier for the project where the data asset resides.

data_asset_id

String

Identifier for the specific data asset (e.g., table/stream) being monitored.

monitor_Id

Long

Unique identifier for the Monitor that performed the check. This is the primary key for tracking the rule execution.

scan_timestamp

Timestamp

The exact time the DQ job completed execution.

total_records_failed

Integer

Count of records that failed the specific Monitor's check.

total_records_scanned

Integer

Total count of records processed by the job for this check.

record_id_attribute_name

String

The name of the primary key/ID attribute used to uniquely identify records in the data asset.

record_id_sample

Array of Strings

A sample of up to 100 failed record IDs. This sample aids in immediate investigation and debugging.


User Actionability

  • To check DQ status: Users should query the External Destination Table, filtering by data_asset_id and scan_timestamp.

  • To debug failures: Users can use the record_id_sample along with the record_id_attribute_name to look up the failing records directly in the source data asset for diagnosis.


Supported Destinations

DQ scan results can be written to the following external storage destinations:

  • Amazon S3

  • Azure File Storage

  • Google Cloud Storage (GCS)

Configuring Reporting

Reporting is configured via API. Please refer to the DQ Reporting APIs page for the full reference.

Last updated