Interactive Profiling Tool: Investigator
Last updated
Last updated
The Investigator page allows you to drill down into your dataset for a deeper analysis, similar to the Overview page
It includes a Selector Component for easy navigation and filtering.
Other components can be seen on this page:
Patterns
This section shows the distribution of various value patterns. This can be used to understand the data and build more data validations, like enforcing specific patterns for product identifiers.
Telmai automatically calculates patterns for all attribute values. There are two kinds of patterns calculated, each of which may be useful based on the type of the data:
Compressed Patterns - Replaces sequences of letters (L) or digits (D) with a single character
Expanded Patterns - Replace each letter or digit with L or D, respectively
Values
Shows the distribution of occurrences for the top values in an attribute.
Drill Down
Provides a sample of the dataset’s top values with similar properties. Clicking on any value in this table reveals more details about its properties. This feature can also help identify data quality (DQ) validation violations.
This table can also be used to understand DQ validation violations, which will be described in a later section.
In this table, you can find the following columns per group/cohort of values
Column
Description
Value
Sample data with similar characteristics
Expectation Violation
Indicates if the value violates a correctness expectation
Failing Expectation
The associated correctness rule (only valid for violations)
Values Count
Number of times the value appears within its attribute
Expanded Pattern
Anomaly score, based on value’s pattern, i.e. representation of the value string but with each character that belongs to one of these types: alphabet/letter (L), digit (D), or space (S) being replaced by the character representing the type. A long sequence of the same type character is represented by the character and a number indicating the length of the sequence
Compressed Pattern
Anomaly score, based on value’s short pattern, i.e. representation of the value string but with each sequence of characters that belong to one of these types: alphabet/letter (L), digit (D), or space (S) being reduced to the single character representing the type
Frequency
Anomaly score based on how frequently the value occurs. A score of 1 is normal; 0 is abnormal.
Length
Anomaly score based on the number of characters in the value; 1 is normal, 0 is abnormal.
Special Characters
Anomaly score based on the presence of special characters in the value; 1 is normal, 0 is abnormal.
Spaces
Anomaly score based on the number of whitespace characters; 1 is normal, 0 is abnormal.
Is Date
True if value is ISO date, false otherwise
Is DateTime
True if the value is a datetime, false otherwise.
Is Number
True if the value is numeric, false otherwise.
Is Alpha
True if the value consists of alphabetical characters, false otherwise.