International Business Machines Corporation
Dynamic Discovery and Correction of Data Quality Issues
Last updated:
Abstract:
A computing device, method, and system are provided of improving data quality to conserve computational resources. The computing device receives a raw dataset. One or more data quality metric goals corresponding to the received raw dataset are received. A schema of the dataset is determined. An initial set of validation nodes is identified based on the schema of the dataset. The initial set of validation nodes are executed. A next set of validation nodes are iteratively expanded and executed based on the schema of the dataset until a termination criterion is reached. A corrected dataset of the raw dataset is provided based on the iterative execution of the initial and next set of validation nodes.
Status:
Application
Type:
Utility
Filling date:
20 Oct 2020
Issue date:
21 Oct 2021