International Business Machines Corporation
NATURAL LANGUAGE PROCESSING WITH MISSING TOKENS IN A CORPUS

Last updated:

Abstract:

Text blocks are semantically compared, and a semantic score is provided to a user. The semantic score is based on application of a machine learning model trained on a text corpus. One or both of the two text blocks may have one or more words that do not appear in the training text corpus (skip-words). Skip-words are used, rather than discarded, to adjust the semantic score via, for example, a penalization function. The user provides feedback about the accuracy of the adjusted semantic score, and the feedback is used to perform supervised learning model.

Status:
Application
Type:

Utility

Filling date:

23 Mar 2020

Issue date:

23 Sep 2021