LogoLogo
HOMEBLOG
  • Getting Started
  • Connect to Data
    • Projects
    • Data Connectors
      • Google BigQuery
      • Google Cloud Storage
      • Iceberg
      • Snowflake
      • AWS S3
      • AWS Athena
      • AWS Redshift
      • Databricks Delta
      • Azure Blob
      • Salesforce
      • SAP Hana
      • File Path Options
      • SQL Server
      • Trino
    • Connection Modes
    • Triggering Scans
    • Configuring a Data Source
  • Profiling Data
    • Data Health Metrics
    • Data Health Overview Page
    • Interactive Profiling Tool: Investigator
    • Data Diff
    • Compound Attributes
      • List of Supported Functions
  • Monitoring Data
    • Data Quality Metrics
    • Alert Policies
    • Data Trends and Alerts
    • Metrics Inspector
  • Data Quality Rules
    • Rules Expression Examples
  • PII Data Detection
  • Remediation
    • Data Binning
    • Circuit Breaker
  • Integrations
    • Jira Integration
    • Slack
    • Jobs Status Notification
  • User Management
    • Microsoft Entra IDP Setup
    • Auth0 Setup
    • Okta SSO Setup
    • SSO Configuration
  • API Reference
    • Authentication API
    • API Keys
    • Telmai IP List
    • Get Google Service Account API
  • Source APIs
    • Source APIs
  • Upload Data APIs
    • Upload data from Cloud
      • RedShift Request data
      • GCS Request data
      • Azure Request data
      • GBQ Request data
      • Snowflake Request data
      • Amazon S3 Request data
      • Delta Lake Request
      • Trino Request data
    • Track upload job
    • Check for alerts
  • Admin APIs
    • User Management
  • Telmai Releases
    • Release Notes
      • 25.2.1
      • 25.2.0
      • 25.1.3
      • 25.1.2
      • 25.1.0
Powered by GitBook
On this page
  1. Profiling Data

Interactive Profiling Tool: Investigator

PreviousData Health Overview PageNextData Diff

Last updated 9 months ago

The Investigator page allows you to drill down into your dataset for a deeper analysis, similar to the Overview page

It includes a for easy navigation and filtering.

Other components can be seen on this page:

Patterns

This section shows the distribution of various value patterns. This can be used to understand the data and build more data validations, like enforcing specific patterns for product identifiers.

Telmai automatically calculates patterns for all attribute values. There are two kinds of patterns calculated, each of which may be useful based on the type of the data:

  • Compressed Patterns - Replaces sequences of letters (L) or digits (D) with a single character

  • Expanded Patterns - Replace each letter or digit with L or D, respectively

Values

Shows the distribution of occurrences for the top values in an attribute.

Drill Down

Provides a sample of the dataset’s top values with similar properties. Clicking on any value in this table reveals more details about its properties. This feature can also help identify data quality (DQ) validation violations.

In this table, you can find the following columns per group/cohort of values

Column

Description

Value

Sample data with similar characteristics

Expectation Violation

Indicates if the value violates a correctness expectation

Failing Expectation

The associated correctness rule (only valid for violations)

Values Count

Number of times the value appears within its attribute

Expanded Pattern

Anomaly score, based on value’s pattern, i.e. representation of the value string but with each character that belongs to one of these types: alphabet/letter (L), digit (D), or space (S) being replaced by the character representing the type. A long sequence of the same type character is represented by the character and a number indicating the length of the sequence

Compressed Pattern

Anomaly score, based on value’s short pattern, i.e. representation of the value string but with each sequence of characters that belong to one of these types: alphabet/letter (L), digit (D), or space (S) being reduced to the single character representing the type

Frequency

Anomaly score based on how frequently the value occurs. A score of 1 is normal; 0 is abnormal.

Length

Anomaly score based on the number of characters in the value; 1 is normal, 0 is abnormal.

Special Characters

Anomaly score based on the presence of special characters in the value; 1 is normal, 0 is abnormal.

Spaces

Anomaly score based on the number of whitespace characters; 1 is normal, 0 is abnormal.

Is Date

True if value is ISO date, false otherwise

Is DateTime

True if the value is a datetime, false otherwise.

Is Number

True if the value is numeric, false otherwise.

Is Alpha

True if the value consists of alphabetical characters, false otherwise.

This table can also be used to understand DQ validation violations, which will be described in a later .

section
Selector Component