International Business Machines Corporation
ADAPTIVE CYCLE CONSISTENCY MULTIMODAL IMAGE CAPTIONING

Last updated:

Abstract:

In an approach to improving the image captioning performance of low-resource languages by leveraging multimodal inputs, one or more computer processors encode an image utilizing an image encoder, wherein the image is contained within a triplet comprising the image, one or more high-resource captions, and one or more low-resource captions. The one or more computer processors generate one or more high-resource captions utilizing the encoded image and the triplet inputted into a high-resource decoder. The one or more computer processors encode the one or more generated high-resource captions utilizing a high-resource encoder. The one or more computer processors add adaptive cycle consistency constraints on a set of calculated attention weights associated the triplet. The one or more computer processors generate one or more low-resource captions by simultaneously inputting the encoded image, the encoded high-resource caption, and the triplet into a trained low-resource decoder.

Status:
Application
Type:

Utility

Filling date:

8 Jul 2020

Issue date:

13 Jan 2022