NVIDIA Corporation
SCENE EMBEDDING FOR VISUAL NAVIGATION

Last updated:

Abstract:

Navigation instructions are determined using visual data or other sensory information. Individual frames can be extracted from video data, captured from passes through an environment, to generate a sequence of image frames. The frames are processed using a feature extractor to generate frame-specific feature vectors. Image triplets are generated, including a representative image frame (or corresponding feature vector), a similar image frame adjacent in the sequence, and a disparate image frame that is separated by a number of frames in the sequence. The embedding network is trained using the triplets. Image data for a current position and a target destination can then be provided as input to the trained embedding model, which outputs a navigation vector indicating a direction and distance over which the vehicle is to be navigated in the physical environment.

Status:
Application
Type:

Utility

Filling date:

19 Jan 2021

Issue date:

13 May 2021