Microsoft Corporation
DATA VALIDATION USING INFERRED PATTERNS
Last updated:
Abstract:
Aspects of the present disclosure relate to data validation using inferred patterns. Columns of a data store may be processed to generate a set of candidate patterns for each respective column, which may be combined to form a combined set of candidate patterns. Columns of the data store may then be processed using the combined set of candidate patterns to generate pattern scores for each candidate pattern with respect to each respective column. The candidate patterns may be ranked according to the pattern scores for given column. For example, the patterns may be ranked using an impurity score indicative of the percentage of rows not represented by a pattern and/or a coverage score indicative of a number of columns in a data store for which the pattern applies. A ranked pattern may be manually or automatically selected, which may then be applied to perform data validation of new data accordingly.
Utility
25 Nov 2020
26 May 2022