International Business Machines Corporation
DATA MODEL PROCESSING IN MACHINE LEARNING USING A REDUCED SET OF FEATURES
Last updated:
Abstract:
A computer system trains a predictive model. A plurality of subsets of features are selected from a dataset comprising a plurality of cases and controls and a plurality of features. Cases and controls are matched to select a plurality of case-control subsets for each subset of features, each case-control subset having similar values for the corresponding subset of features. For each case-control subset, a statistical significance of each feature of the plurality of features absent from the subset of features used to match the case-control subset is identified. A final subset of features is selected based on satisfying a statistical significance of each feature for the plurality of case-control subsets. A predictive model is trained using the final subset of features. Embodiments of the present invention further include a method and program product for training a predictive model in substantially the same manner described above.
Utility
30 Apr 2020
4 Nov 2021