The Boeing Company
Dimension optimization in singular value decomposition-based topic models
Last updated:
Abstract:
Techniques are described for optimizing a number of dimensions for performing a singular value decomposition (SVD) factorization. Embodiments tokenize each of a plurality of documents into a respective set of terms. For each of a plurality of dimension counts, embodiments perform the SVD factorization to determine a respective plurality of dimensions, the respective plurality of dimensions corresponding to the dimension count, determine, for each of the plurality of documents, a respective set of dimension weights for each of the plurality of dimensions, calculate an average top dimension weight across the sets of dimension weights for the plurality of documents and calculate an average inverse top dimension top term ranking across the sets of dimension weights for the plurality of documents. An optimal number of dimensions is calculated, based on the average top dimension weight and the average inverse top dimension top term ranking.
Utility
30 Mar 2017
10 Sep 2019