Microsoft Corporation
METHOD AND SYSTEM FOR AUTOMATICALLY TAGGING DATA
Last updated:
Abstract:
Systems and methods relate to auto-tagging of data in a data lake or a data storage. Generating a statistical summary of the data lake and interactively receiving data in a selected column of an exemplar data addresses an issue of efficiently and accurately auto-tagging data in a data lake. The present disclosure automatically generates a statistical summary of the data lake using a lightweight off-line processing. A graphical user interface interactively receives an exemplar data file with a selection of a column in the exemplar data file. A list of candidate data-tagging patterns is generated based on the statistical summary and updates the list by removing candidate data-tagging patterns that under-generalize the data. The present disclosure determines a data-tagging pattern by selecting a candidate data-tagging profile from the list based on having the least number of matching columns in the data lake.
Utility
19 Nov 2020
19 May 2022