Biodesix, Inc.
Bagged filtering method for selection and deselection of features for classification
Last updated:
Abstract:
Classifier generation methods are described in which features used in classification (e.g., mass spectral peaks) are selected, or deselected using bagged filtering. A development sample set is split into two subsets, one of which is used as a training set the other of which is set aside. We define a classifier (e.g., K-nearest neighbor, decision tree, margin-based classifier or other) using the training subset and at least one of the features (or subsets of two or more features in combination). We apply the classifier to a subset of samples. A filter is applied to the performance of the classifier on the sample subset and the at least one feature is added to a "filtered feature list" if the classifier performance passes the filter. We do this for many different realizations of the separation of the development sample set into two subsets, and, for each realization, different features or sets of features in combination. After all the iterations are performed the filtered feature list is used to either select features, or deselect features, for a final classifier.
Utility
5 Apr 2016
14 Jul 2020