Microsoft Corporation
Automatically evaluating caption quality of rich media using context learning

Last updated:

Abstract:

Technologies for evaluating, scoring, and determining whether to present a caption of an image are provided. The disclosed techniques include receiving an image with associated metadata. Contextual data is identified from the image and the metadata. A generated caption for the image is received from an image caption generator. A first vector representation is generated based on the contextual image data and a second vector representation is generated based on the generated caption. A machine learned model generates a score for the generated caption using the first vector representation and the second vector representation. The score represents a confidence value defining how accurately the caption describes the image. Based on the score, the caption may be presented along with the image on a client device.

Status:
Grant
Type:

Utility

Filling date:

30 Aug 2019

Issue date:

14 Sep 2021