Skip to content

Data Quality

Overview

Data Streams can have automated validation checks enabled to monitor the quality of the data being received from the Asset.

These data quality validations are done at the edge in real time and any Application can subscribe to the results.

Note

The results are also saved to the Kelvin Cloud and will become available in future releases of Kelvin.

There are many types of data quality validation algorithms available to detect issues and maintain the integrity and reliability of data within the Kelvin Platform.

Data Quality Registration

Data Quality first needs to be registered.

Once registered, the selected data quality applications will monitor the data coming from the Asset/Data Stream pairs and output the results.

Note

Some Data Quality Applications will output regular reports, like Edge Data Availability, and others will only output reports when a problem is detected, like Outlier Detection.

Data Quality Inputs

Applications can subscribe selected Data Streams to run specific Data Quality validations.

Warning

You can subscribe to Asset/Data Streams for Data Quality validations which are not registered, but you will not receive any outputs as they are not being processed.

They can then see the validation's results in real time and react to any data quality issues. The type of reaction depends on the developer's requirements, for example sending emails or slack messages.

There are a number of inbuilt Data Quality validation options available.

Validation Description Configurable Parameters
kelvin_timestamp_anomaly Detects anomalies or irregularities in the timestamp sequence None
kelvin_duplicate_detection Detects duplicate values within a defined window size window_size (default: 5)
kelvin_out_of_range_detection Validates whether values fall within an expected range min_threshold, max_threshold
kelvin_outlier_detection Uses statistical methods to detect outliers over a moving window model, threshold (default: 3), window_size (default: 10)
kelvin_data_availability Ensures expected number of messages are received in a given time window window_expected_number_msgs, window_time_interval_unit (second, minute, hour, day)

All the validations are also saved in the Cloud and historical data will be accessible through the Kelvin UI in future releases.

Data Quality Outputs

It is also possible to create your own custom Data Quality Applications that will process and validate incoming Data Streams and then produce data quality information that can be used by other Applications.

The other Applications will connect to the custom validations through the Data Quality input key in the app.yaml and not directly to the Application doing the validation calculations.

All the validations are also saved in the Cloud and historical data will be accessible through the Kelvin UI in future releases.