International Business Machines Corporation
DETERMINING METADATA OF A DATASET
Last updated:
Abstract:
The present disclosure relates to a method for enabling a processing of a dataset of records having a set of attributes. The method comprises: selecting a first attribute of the set of attributes and a subset of one or more second attributes of the set of attributes. Distinct values of the subset of second attributes may be determined from the dataset. For each distinct value of the determined distinct values records of the dataset that have said each distinct value may be identified, and a group of words may be formed from values of the first attribute of the identified records. Distinct word sequences may be identified in the formed groups and a level of presence of each word sequence of the word sequences in each of the formed groups may be determined. At least part of the levels of presence may be provided as metadata.
Utility
29 Jul 2020
3 Feb 2022