International Business Machines Corporation
Identifying ambiguity in semantic resources

Last updated:

Abstract:

Embodiments relate to a system, program product, and method for dictionary membership management directed at identifying ambiguity in semantic resources. A dictionary of seed terms is applied to a text corpus and matching items in the corpus are identified. The linguistic properties for each matching item are characterized and a context pattern of each matching item is constructed. Each context pattern is applied to the dictionary and matching content between the seed terms and the context pattern is identified and quantified. Lexicon items from the dictionary that have anomalous behavior reflected in the quantification are identified. One or more seed words identified as having anomalous behavior are selectively removed from the dictionary.

Status:
Grant
Type:

Utility

Filling date:

29 Jul 2019

Issue date:

5 Jul 2022