Microsoft Corporation
MULTI-SPEAKER NEURAL TEXT-TO-SPEECH SYNTHESIS
Last updated:
Abstract:
A method for generating speech through multi-speaker neural text-to-speech (TTS) synthesis is provided. A text input may be received (1410). Speaker latent space information of a target speaker may be provided through at least one speaker model (1420). At least one acoustic feature may be predicted through an acoustic feature predictor based on the text input and the speaker latent space information (1430). A speech waveform corresponding to the text input may be generated through a neural vocoder based on the at least one acoustic feature and the speaker latent space information (1440).
Status:
Application
Type:
Utility
Filling date:
11 Dec 2018
Issue date:
13 Jan 2022