Microsoft Corporation
MULTI-SPEAKER NEURAL TEXT-TO-SPEECH SYNTHESIS

Last updated:

Abstract:

A method for generating speech through multi-speaker neural text-to-speech (TTS) synthesis is provided. A text input may be received (1410). Speaker latent space information of a target speaker may be provided through at least one speaker model (1420). At least one acoustic feature may be predicted through an acoustic feature predictor based on the text input and the speaker latent space information (1430). A speech waveform corresponding to the text input may be generated through a neural vocoder based on the at least one acoustic feature and the speaker latent space information (1440).

Status:
Application
Type:

Utility

Filling date:

11 Dec 2018

Issue date:

13 Jan 2022