International Business Machines Corporation
Linguistically consistent document annotation

Last updated:

Abstract:

A system, method, and computer program product for text annotation. The system includes at least one processing component, at least one memory component, an annotation corpus, and a document processor. The document processor is configured to receive a document including at least one annotated text span. The at least one annotated text span is annotated with a type from a type system (e.g., a domain-specific type system). The document processor is also configured to extract linguistic features of the at least one annotated text span, and generate a type attribute for the type based on the extracted linguistic features. Further, the document processor is configured to receive a new annotated text span, which is annotated with the type. The document processor extracts linguistic features of the new annotated text span, and determines whether the linguistic features match the type attribute.

Status:
Grant
Type:

Utility

Filling date:

7 Apr 2020

Issue date:

27 Jul 2021