International Business Machines Corporation
NATURAL LANGUAGE PROCESSING WITH MISSING TOKENS IN A CORPUS

Last updated: 29 Sep 2021

Abstract:

Text blocks are semantically compared, and a semantic score is provided to a user. The semantic score is based on application of a machine learning model trained on a text corpus. One or both of the two text blocks may have one or more words that do not appear in the training text corpus (skip-words). Skip-words are used, rather than discarded, to adjust the semantic score via, for example, a penalization function. The user provides feedback about the accuracy of the adjusted semantic score, and the feedback is used to perform supervised learning model.

Status:

Application

Type:

Utility

Filling date:

23 Mar 2020

Issue date:

23 Sep 2021

Full patent description

Patent application document

International Business Machines Corporation NATURAL LANGUAGE PROCESSING WITH MISSING TOKENS IN A CORPUS

Abstract:

International Business Machines Corporation
NATURAL LANGUAGE PROCESSING WITH MISSING TOKENS IN A CORPUS