International Business Machines Corporation
Multi-modal document feature extraction

Last updated:

Abstract:

Systems and methods are described for generating a machine learning model for multi-modal feature extraction. The method may include receiving a document in a digital format, where the digital format comprises text information and image information, performing a text extraction function on a first portion of the document to produce a set of text features, performing an image extraction function on a second portion of the document to produce a set of image features, generating a feature tree, wherein a plurality of nodes of the feature tree correspond to the set of text features and the set of image features, and generating an input vector for a machine learning model based on the feature tree. In some cases, the feature tree may be generated synthetically, or modified by a user prior to being converted into the input vector.

Status:
Grant
Type:

Utility

Filling date:

6 Dec 2018

Issue date:

7 Dec 2021