Fair Isaac Corporation
LATENT FEATURE DIMENSIONALITY BOUNDS FOR ROBUST MACHINE LEARNING ON HIGH DIMENSIONAL DATASETS
Last updated:
Abstract:
Computer-implemented methods and systems for quantifying appropriate machine learning model complexity corresponding to training dataset are provided. The method comprises monitoring, using one or more processors, N observed variables, v.sub.1 through v.sub.N, of a training dataset for a machine learning model; translating the N observed variables into m equisized bin indexes which generate m.sup.N possible equisized hypercells to estimate a fundamental dimensionality for the dataset; generating one or more samples by assigning a record in the dataset with numbers j through k as set id; generating a merged sample Si, for one or more values of the set id i, where i goes from j to k; and computing a fractal dimension of the equisized hypercube phase space based on count of cells with data coverage of at least one data point.
Utility
30 Jun 2020
30 Dec 2021