Monitoring Data
Telmai helps you track ongoing data issues by providing a clear and comprehensive view of all your alerts and incidents.
Key Concepts
An Alert is the initial notification that a system's metric has violated a predefined threshold. It's an automatic signal that something is amiss, such as a spike in CPU usage or a high number of transaction failures.
An Incident is a broader event that is triggered by one or more alerts. It represents an ongoing issue that requires investigation and resolution. A single incident may contain multiple, related alerts that help tell the story of the problem.
Think of it this way: a single data point violating a threshold generates an Alert. When multiple alerts occur that are related to the same problem, they can be grouped into a single Incident to provide a consolidated view of the issue.
Incidents Portal
This page provides a comprehensive view of all incidents, helping you quickly understand the status of your monitored systems. The portal is designed to give you a high-level overview of ongoing issues while also allowing you to drill down into specific details.
Key Features
The main dashboard is organized into a few key sections:
Incidents Cards: At the top of the portal, you'll see a series of cards that provide a quick snapshot of incident activity, including the number of open and closed incidents, the meantime to resolve (MTTR), and tags distribution.
Open Incidents by Impact: This graph tracks the number of open incidents over time, categorized by their impact level. This allows you to quickly see the severity of current issues and identify any spikes in high-impact incidents.
Incidents Table: The table provides a detailed, sortable list of all incidents, both active and closed. You can review key information at a glance, including the incident's status, creation time, and any associated external tickets.
Incident Details: Clicking on any incident in the table will open a new view with more granular information. This view provides a full history of the incident, including a list of the alerts that triggered it. This helps you understand the specific events and data points that led to the incident's creation.

To learn more about how to setup your thresholds, please refer to the following pages:
Last updated