Microsoft Corporation
Automatically evaluating caption quality of rich media using context learning
Last updated:
Abstract:
Technologies for evaluating, scoring, and determining whether to present a caption of an image are provided. The disclosed techniques include receiving an image with associated metadata. Contextual data is identified from the image and the metadata. A generated caption for the image is received from an image caption generator. A first vector representation is generated based on the contextual image data and a second vector representation is generated based on the generated caption. A machine learned model generates a score for the generated caption using the first vector representation and the second vector representation. The score represents a confidence value defining how accurately the caption describes the image. Based on the score, the caption may be presented along with the image on a client device.
Utility
30 Aug 2019
14 Sep 2021