Model Training
All data sent to Telmai is subject to statistical and machine learning analysis. Machine learning (ML) is used to automatically detect anomalies and produce DQ scores. In addition, it is used to highlight the most important anomaly factors in the dataset.

Initial Load

When the first batch is loaded for a data source, ML model training is triggered unless explicitly skipped via train_model=false query parameter. This process takes up to 80% of the processing time and hence can be time consuming for very large datasets. It is therefore recommended to prime the datasource with a relatively small sample of data (up to 100MB) to train the models. All subsequent loads to the same datasource irrespective of train_model flag will omit this model training step since models for the attributes already exist.
If a new attribute is detected in the subsequent load, then the model will be trained only for that attribute given that train_model=true
train_model query parameter when loading data
true - force model retraining as a part of the processing of the batch. Only data from the current batch will be used for training. Training will happen only if prior model for the same attribute does not exist
false - model training step is omitted
Last modified 1mo ago
Copy link
Initial Load