International Business Machines Corporation
Generation of domain thesaurus
Last updated:
Abstract:
Embodiments provide a computer implemented method for generating a domain-specific thesaurus on a cognitive system, comprising: receiving data of the domain-specific corpus and a plurality of terms of interest from a user; splitting the data of the domain-specific corpus into a plurality of sentences using natural language processing techniques; for each term in the plurality of terms of interest, retrieving a plurality of candidate sentences containing a corresponding term, from the plurality of sentences; for each candidate sentence, providing a list of synonyms of the corresponding term, wherein the synonyms are contextual alternatives in the corresponding candidate sentence; for each term in the plurality of terms of interest, tracking a frequency of each synonym, and forming a frequency map including all the synonyms of a corresponding term and the frequency of each synonym; and generating a domain-specific thesaurus based on a combination of all the synonyms in the frequency map.
Utility
26 Apr 2019
26 Jul 2022