Microsoft Corporation
Feature generation pipeline for machine learning
Last updated:
Abstract:
Techniques for implementing a feature generation pipeline for machine learning are provided. In one technique, multiple jobs are executed, each of which computes a different set of feature values for a different feature of multiple features associated with videos. A feature registry is stored that lists each of the multiple features. After the jobs are executed and the feature registry is stored, a model specification is received that indicates a set of features for a model. For each feature in a subset of the set of features, a location is identified in storage where a value for said each feature is found and the value for that feature is retrieved from the location. A feature vector is created that comprises, for each feature in the set of features, the value that corresponds to that feature. The feature vector is used to train the model or as input to the model.
Utility
30 Jun 2018
7 Dec 2021