Amazon.com, Inc.
Text-to-speech processing using input voice characteristic data

Last updated:

Abstract:

During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

Status:
Grant
Type:

Utility

Filling date:

27 Sep 2019

Issue date:

28 Jun 2022