Fortinet, Inc.
CONVEX OPTIMIZED STOCHASTIC VECTOR SAMPLING BASED REPRESENTATION OF GROUND TRUTH
Last updated:
Abstract:
Systems and methods are described for training a machine learning model using intelligently selected multiclass vectors. According to an embodiment, a processing resource of a computing system receives a first set of un-labeled feature vectors. The first set feature vectors are homomorphically translated using a T-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm to obtain a second set of feature vectors with reduced dimensionality. The second set of feature vectors are clustered to obtain an initial set of clusters using centroid-based clustering. An optimal set of clusters is identified among the initial set of clusters by performing a convex optimization process on the initial set of clusters. For each cluster of the optimal set of clusters, a representative vector from the cluster is selected for labeling.
Utility
11 Sep 2020
17 Mar 2022