Bank of America Corporation
System and Method for Ascertaining Data Labeling Accuracy in Supervised Learning Systems
Last updated:
Abstract:
Aspects of the disclosure relate to improving training data used for model generation. The computing platform may receive, from one or more data sources, a labelled data set. The computing platform may apply, to the labelled data set, an unsupervised learning algorithm, which may result in a clustered data set corresponding to the labelled data set. The computing platform may compare, for each data point in the labelled data set, corresponding clustering information and labelling information to identify discrepancies. The computing platform may flag, for data points with identified discrepancies between the corresponding clustering information and labelling information, a data labelling error. Using data points without identified discrepancies between the corresponding clustering information and labelling information, the computing platform may train a supervised learning model. The computing platform then may store the trained supervised learning model.
Utility
8 Jan 2021
14 Jul 2022