International Business Machines Corporation
Term extraction in highly technical domains

Last updated: 15 Jun 2022

Abstract:

A language model is fine-tuned by extracting terminology terms from a text document. The method comprises identifying a text snippet, identifying candidate multi-word expressions using part of speech tags, and determining a specificity score value for each of the candidate multi-word expressions. Moreover, the method comprises determining a topic similarity score value for each of the candidate multi-word expressions, selecting remaining expressions from the candidate multi-word expressions using a function of a specificity value and a topic similarity value of each of the candidate multi-word expressions, adding a noun comprised in the text snippet to the remaining expressions depending on a correlation function, labeling the remaining multi-word expressions, and fine-tuning an existing pre-trained transformer-based language model using as training data the identified text snippet marked with the labeled remaining expressions.

Status:

Grant

Type:

Utility

Filling date:

28 Jun 2021

Issue date:

14 Jun 2022

Full patent description

Patent application document

International Business Machines Corporation Term extraction in highly technical domains

Abstract:

International Business Machines Corporation
Term extraction in highly technical domains