International Business Machines Corporation
NATURAL LANGUAGE PROCESSING WITH MISSING TOKENS IN A CORPUS
Last updated:
Abstract:
Text blocks are semantically compared, and a semantic score is provided to a user. The semantic score is based on application of a machine learning model trained on a text corpus. One or both of the two text blocks may have one or more words that do not appear in the training text corpus (skip-words). Skip-words are used, rather than discarded, to adjust the semantic score via, for example, a penalization function. The user provides feedback about the accuracy of the adjusted semantic score, and the feedback is used to perform supervised learning model.
Status:
Application
Type:
Utility
Filling date:
23 Mar 2020
Issue date:
23 Sep 2021