Palo Alto Networks, Inc.
AUTOMATED CONTENT TAGGING WITH LATENT DIRICHLET ALLOCATION OF CONTEXTUAL WORD EMBEDDINGS
Last updated:
Abstract:
Dynamic content tags are generated as content is received by a dynamic content tagging system. A natural language processor (NLP) tokenizes the content and extracts contextual N-grams based on local or global context for the tokens in each document in the content. The contextual N-grams are used as input to a generative model that computes a weighted vector of likelihood values that each contextual N-gram corresponds to one of a set of unlabeled topics. A tag is generated for each unlabeled topic comprising the contextual N-gram having a highest likelihood to correspond to that unlabeled topic. Topic-based deep learning models having tag predictions below a threshold confidence level are retrained using the generated tags, and the retrained topic-based deep learning models dynamically tag the content.
Utility
25 Feb 2020
26 Aug 2021