Snap Inc.
Text and audio-based real-time face reenactment

Last updated:

Abstract:

Provided are systems and methods for text and audio-based real-time face reenactment. An example method includes receiving an input text and a target image, the target image including a target face; generating, based on the input text, a sequence of sets of acoustic features representing the input text; determining, based on the sequence of sets of acoustic features, a sequence of sets of scenario data indicating modifications of the target face for pronouncing the input text; generating, based on the sequence of sets of scenario data, a sequence of frames, wherein each of the frames includes the target face modified based on at least one of the sets of scenario data; generating, based on the sequence of frames, an output video; and synthesizing, based on the sequence of sets of acoustic features, an audio data and adding the audio data to the output video.

Status:
Grant
Type:

Utility

Filling date:

11 Jul 2019

Issue date:

7 Sep 2021