Microsoft Corporation
Processing image-bearing electronic documents using a multimodal fusion framework

Last updated: 13 Apr 2022

Abstract:

A computer-implemented technique uses one or more neural networks to identify at least one item name associated with an input image using a multi-modal fusion approach. The technique is said to be multi-modal because it collects and processes different kinds of evidence regarding each detected item name. The technique is said to adopt a fusion approach because it fuses the multi-modal evidence into an output conclusion that identifies at least one item name associated with the input image. In one example, a first mode collects evidence by identifying and analyzing regions in the input image that are likely to include item name-related information. A second mode collects and analyzes any text that appears as part of input image itself. A third mode collects and analyzes text that is not included in the input image itself, but is nonetheless associated with the input image.

Status:

Grant

Type:

Utility

Filling date:

25 Mar 2020

Issue date:

12 Apr 2022

Full patent description

Patent application document

Microsoft Corporation Processing image-bearing electronic documents using a multimodal fusion framework

Abstract:

Microsoft Corporation
Processing image-bearing electronic documents using a multimodal fusion framework