Adobe Inc.
SELF-SUPERVISED VISUAL-RELATIONSHIP PROBING

Last updated:

Abstract:

Methods and systems disclosed herein relate generally to systems and methods for generating visual relationship graphs that identify relationships between objects depicted in an image. A vision-language application uses transformer encoders to generate a graph structure, in which the graph structure represents a dependency between a first region and a second region of an image. The dependency indicates that a contextual representation of the first region was derived, at least in part, by processing the second region. The contextual representation identifies a predicted identity of an image object depicted in the first region. The predicted identity is determined at least in part by identifying a relationship between the first region and other data objects associated with various modalities.

Status:
Application
Type:

Utility

Filling date:

9 Nov 2020

Issue date:

12 May 2022