International Business Machines Corporation
Dynamic Discovery and Correction of Data Quality Issues

Last updated:

Abstract:

A computing device, method, and system are provided of improving data quality to conserve computational resources. The computing device receives a raw dataset. One or more data quality metric goals corresponding to the received raw dataset are received. A schema of the dataset is determined. An initial set of validation nodes is identified based on the schema of the dataset. The initial set of validation nodes are executed. A next set of validation nodes are iteratively expanded and executed based on the schema of the dataset until a termination criterion is reached. A corrected dataset of the raw dataset is provided based on the iterative execution of the initial and next set of validation nodes.

Status:
Application
Type:

Utility

Filling date:

20 Oct 2020

Issue date:

21 Oct 2021