International Business Machines Corporation
DOMAIN KNOWLEDGE BASED FEATURE EXTRACTION FOR ENHANCED TEXT REPRESENTATION
Last updated:
Abstract:
Provided are a method, system, and computer program product for representing text, in which a text is received and analyzed by utilizing a pre-trained embedding model and a feature vector model, wherein selected words in the text have corresponding weights. Operations whose parameters include weights of a feature vector and an embedding are performed to generate a weighted embedding data structure. A summation is performed of all corresponding columns of a plurality of rows of the weighted embedding data structure to generate a data structure that represents the text. The data structure that represents the text is utilized to generate at least one of a classification metadata for the text and a summarization of the text.
Utility
22 Oct 2020
28 Apr 2022