International Business Machines Corporation
Preventing Data Leakage in Automated Machine Learning

Last updated:

Abstract:

A mechanism is provided in a data processing system for preventing data leakage in automated machine learning. The mechanism receives a data set comprising a label for a target variable for a classifier machine learning model and a set of features. For each given feature in the set of features, the mechanism trains a subprime classifier model using the given feature as a target variable and remaining features as independent input features, tests the subprime classifier model, and records results of the subprime classifier model. The mechanism performs statistical analysis on the recorded results to identify an outlier result corresponding to an outlier subprime classifier model. The mechanism identifies a outlier feature within the set of features corresponding to the subprime classifier model, removes the identified outlier feature from the set of features to form a modified set of features, and trains the classifier machine learning model using the label for the target variable and the modified set of features.

Status:
Application
Type:

Utility

Filling date:

21 Jul 2020

Issue date:

27 Jan 2022