Microsoft Corporation
MULTI-SPEAKER NEURAL TEXT-TO-SPEECH SYNTHESIS

Last updated: 19 Jan 2022

Abstract:

A method for generating speech through multi-speaker neural text-to-speech (TTS) synthesis is provided. A text input may be received (1410). Speaker latent space information of a target speaker may be provided through at least one speaker model (1420). At least one acoustic feature may be predicted through an acoustic feature predictor based on the text input and the speaker latent space information (1430). A speech waveform corresponding to the text input may be generated through a neural vocoder based on the at least one acoustic feature and the speaker latent space information (1440).

Status:

Application

Type:

Utility

Filling date:

11 Dec 2018

Issue date:

13 Jan 2022

Full patent description

Patent application document

Microsoft Corporation MULTI-SPEAKER NEURAL TEXT-TO-SPEECH SYNTHESIS

Abstract:

Microsoft Corporation
MULTI-SPEAKER NEURAL TEXT-TO-SPEECH SYNTHESIS