International Business Machines Corporation
DOMAIN KNOWLEDGE BASED FEATURE EXTRACTION FOR ENHANCED TEXT REPRESENTATION

Last updated: 18 May 2022

Abstract:

Provided are a method, system, and computer program product for representing text, in which a text is received and analyzed by utilizing a pre-trained embedding model and a feature vector model, wherein selected words in the text have corresponding weights. Operations whose parameters include weights of a feature vector and an embedding are performed to generate a weighted embedding data structure. A summation is performed of all corresponding columns of a plurality of rows of the weighted embedding data structure to generate a data structure that represents the text. The data structure that represents the text is utilized to generate at least one of a classification metadata for the text and a summarization of the text.

Status:

Application

Type:

Utility

Filling date:

22 Oct 2020

Issue date:

28 Apr 2022

Full patent description

Patent application document

International Business Machines Corporation DOMAIN KNOWLEDGE BASED FEATURE EXTRACTION FOR ENHANCED TEXT REPRESENTATION

Abstract:

International Business Machines Corporation
DOMAIN KNOWLEDGE BASED FEATURE EXTRACTION FOR ENHANCED TEXT REPRESENTATION