International Business Machines Corporation
DATA-ANALYSIS-BASED, NOISY LABELED AND UNLABELED DATAPOINT DETECTION AND RECTIFICATION FOR MACHINE-LEARNING

Last updated: 24 Nov 2021

Abstract:

Noisy labeled and unlabeled datapoint detection and rectification in a training dataset for machine-learning is facilitated by a processor(s) obtaining a training dataset for use in training a machine-learning model. The processor(s) applies ensemble machine-learning and a generative model to the training dataset to detect noisy labeled datapoints in the training dataset, and create a clean dataset with preliminary labels added for any unlabeled datapoints in the training dataset. Data-driven active learning and the clean dataset are used by the processor(s) to facilitate generating an active-learned dataset with true labels added for one or more selected datapoints of a datapoint pool including the detected noisy labeled datapoints and the unlabeled datapoints of the training dataset. The machine-learning model is trained by the processor(s) using, at least in part, the clean dataset and the active-learned dataset.

Status:

Application

Type:

Utility

Filling date:

13 May 2020

Issue date:

18 Nov 2021

Full patent description

Patent application document

International Business Machines Corporation DATA-ANALYSIS-BASED, NOISY LABELED AND UNLABELED DATAPOINT DETECTION AND RECTIFICATION FOR MACHINE-LEARNING

Abstract:

International Business Machines Corporation
DATA-ANALYSIS-BASED, NOISY LABELED AND UNLABELED DATAPOINT DETECTION AND RECTIFICATION FOR MACHINE-LEARNING