Honda Motor Co., Ltd.
DOCUMENT ANALYSIS SYSTEM
Last updated:
Abstract:
There is provided a system configured to appropriately determine a topic count in accordance with LDA to estimate latent meanings of a document. For a plurality of documents d, a perplexity PPL of each document d is evaluated in accordance with a document generation probability in which the document d is generated when topic counts N for defining a topic model based on the LDA as a document generation model are hypothetically specified as different values and word groups are specified by different random numbers. The topic model is defined by a reference topic count N.sub.0 determined by combining a first topic count N.sub.1 (the number of topics indicating a highest cumulative frequency at which the perplexity PPL first indicates a minimum value) and a second topic count N.sub.2 (the number of topics indicating a highest cumulative frequency at which the perplexity PPL indicates a smallest value).
Utility
22 Feb 2021
26 Aug 2021