International Business Machines Corporation
Descriptor Uniqueness for Entity Clustering

Last updated:

Abstract:

A mechanism is provided in a data processing system to implement a cognitive natural language processing (NLP) system with descriptor uniqueness identification to support named entity mention clustering. The mechanism annotates a set of documents from a corpus of documents for entity types and mentions, collects descriptor usages from all documents in the corpus of documents, analyzes the descriptor usages to classify the descriptors as base terms or modifier terms, generates compatibility scores for the descriptors, and performs entity merging of entity clusters based on the compatibility scores.

Status:
Application
Type:

Utility

Filling date:

17 Feb 2020

Issue date:

19 Aug 2021