Interactive Profiling Tool: Investigator
Last updated
Last updated
The Investigator page allows you to drill down into your dataset for a deeper analysis, similar to the Overview page
It includes a Selector Component for easy navigation and filtering.
Other components can be seen on this page:
Patterns
This section shows the distribution of various value patterns. This can be used to understand the data and build more data validations, like enforcing specific patterns for product identifiers.
Telmai automatically calculates patterns for all attribute values. There are two kinds of patterns calculated, each of which may be useful based on the type of the data:
Compressed Patterns - Replaces sequences of letters (L) or digits (D) with a single character
Expanded Patterns - Replace each letter or digit with L or D, respectively
Values
Shows the distribution of occurrences for the top values in an attribute.
Drill Down
Provides a sample of the dataset’s top values with similar properties. Clicking on any value in this table reveals more details about its properties. This feature can also help identify data quality (DQ) validation violations.
This table can also be used to understand DQ validation violations, which will be described in a later section.
In this table, you can find the following columns per group/cohort of values
Column | Description |
Value | Sample data with similar characteristics |
Expectation Violation | Indicates if the value violates a correctness expectation |
Failing Expectation | The associated correctness rule (only valid for violations) |
Values Count | Number of times the value appears within its attribute |
Expanded Pattern | Anomaly score, based on value’s pattern, i.e. representation of the value string but with each character that belongs to one of these types: alphabet/letter (L), digit (D), or space (S) being replaced by the character representing the type. A long sequence of the same type character is represented by the character and a number indicating the length of the sequence |
Compressed Pattern | Anomaly score, based on value’s short pattern, i.e. representation of the value string but with each sequence of characters that belong to one of these types: alphabet/letter (L), digit (D), or space (S) being reduced to the single character representing the type |
Frequency | Anomaly score based on how frequently the value occurs. A score of 1 is normal; 0 is abnormal. |
Length | Anomaly score based on the number of characters in the value; 1 is normal, 0 is abnormal. |
Special Characters | Anomaly score based on the presence of special characters in the value; 1 is normal, 0 is abnormal. |
Spaces | Anomaly score based on the number of whitespace characters; 1 is normal, 0 is abnormal. |
Is Date | True if value is ISO date, false otherwise |
Is DateTime | True if the value is a datetime, false otherwise. |
Is Number | True if the value is numeric, false otherwise. |
Is Alpha | True if the value consists of alphabetical characters, false otherwise. |