International Business Machines Corporation
PREPARING DOCUMENTS FOR COREFERENCE ANALYSIS
Last updated:
Abstract:
Unstructured text is identified as larger than a threshold size. Named-entity recognition analysis is executed on the unstructured text. One or more anchor entities of the unstructured text are determined that each occur more than a threshold amount of times within the unstructured text. Two or more instances of the one or more anchor entities that are separated by at least a threshold amount of text of the unstructured text are identified. The unstructured text is partitioned into at least three sections. The unstructured text is partitioned at respective natural language demarcation points associated with each of the two or more instances such that each of the at least three sections is smaller than the threshold size. Separate coreference analyses are performed in parallel on each of the at least three sections.
Utility
31 Dec 2020
30 Jun 2022