Adobe Inc.
EXPRESSIVE TEXT-TO-SPEECH UTILIZING CONTEXTUAL WORD-LEVEL STYLE TOKENS

Last updated: 2 Mar 2022

Abstract:

The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate expressive audio for input texts based on a word-level analysis of the input text. For example, the disclosed systems can utilize a multi-channel neural network to generate a character-level feature vector and a word-level feature vector based on a plurality of characters of an input text and a plurality of words of the input text, respectively. In some embodiments, the disclosed systems utilize the neural network to generate the word-level feature vector based on contextual word-level style tokens that correspond to style features associated with the input text. Based on the character-level and word-level feature vectors, the disclosed systems can generate a context-based speech map. The disclosed systems can utilize the context-based speech map to generate expressive audio for the input text.

Status:

Application

Type:

Utility

Filling date:

21 Jul 2020

Issue date:

27 Jan 2022

Full patent description

Patent application document

Adobe Inc. EXPRESSIVE TEXT-TO-SPEECH UTILIZING CONTEXTUAL WORD-LEVEL STYLE TOKENS

Abstract:

Adobe Inc.
EXPRESSIVE TEXT-TO-SPEECH UTILIZING CONTEXTUAL WORD-LEVEL STYLE TOKENS