VMware, Inc.
Inter-Feature Influence in Unlabeled Datasets
Last updated:
Abstract:
In one set of embodiments, a computer system can receive an unlabeled dataset comprising a plurality of unlabeled data instances, each unlabeled data instance including values for a plurality of features. The computer system can train, for each feature, a supervised machine learning (ML) model on a labeled dataset derived from the unlabeled dataset, where the labeled dataset comprises a plurality of labeled data instances, and wherein each labeled data instance includes (1) a label corresponding to a value for the feature in an unlabeled data instance of the unlabeled dataset, and (2) values for other features in the unlabeled data instance. The computer system can then compute, for each pair of first and second features in the plurality of features, an inter-feature influence score using the trained supervised ML model for the second feature, the inter-feature influence score indicating how useful the first feature is in predicting the second feature.
Utility
8 Dec 2020
9 Jun 2022