International Business Machines Corporation
DATA MODEL PROCESSING IN MACHINE LEARNING USING A REDUCED SET OF FEATURES

Last updated:

Abstract:

A computer system trains a predictive model. A plurality of subsets of features are selected from a dataset comprising a plurality of cases and controls and a plurality of features. Cases and controls are matched to select a plurality of case-control subsets for each subset of features, each case-control subset having similar values for the corresponding subset of features. For each case-control subset, a statistical significance of each feature of the plurality of features absent from the subset of features used to match the case-control subset is identified. A final subset of features is selected based on satisfying a statistical significance of each feature for the plurality of case-control subsets. A predictive model is trained using the final subset of features. Embodiments of the present invention further include a method and program product for training a predictive model in substantially the same manner described above.

Status:
Application
Type:

Utility

Filling date:

30 Apr 2020

Issue date:

4 Nov 2021