International Business Machines Corporation
PREPARING DOCUMENTS FOR COREFERENCE ANALYSIS

Last updated: 13 Jul 2022

Abstract:

Unstructured text is identified as larger than a threshold size. Named-entity recognition analysis is executed on the unstructured text. One or more anchor entities of the unstructured text are determined that each occur more than a threshold amount of times within the unstructured text. Two or more instances of the one or more anchor entities that are separated by at least a threshold amount of text of the unstructured text are identified. The unstructured text is partitioned into at least three sections. The unstructured text is partitioned at respective natural language demarcation points associated with each of the two or more instances such that each of the at least three sections is smaller than the threshold size. Separate coreference analyses are performed in parallel on each of the at least three sections.

Status:

Application

Type:

Utility

Filling date:

31 Dec 2020

Issue date:

30 Jun 2022

Full patent description

Patent application document

International Business Machines Corporation PREPARING DOCUMENTS FOR COREFERENCE ANALYSIS

Abstract:

International Business Machines Corporation
PREPARING DOCUMENTS FOR COREFERENCE ANALYSIS