Microsoft Corporation
SYSTEM AND METHOD FOR CROSS-SPEAKER STYLE TRANSFER IN TEXT-TO-SPEECH AND TRAINING DATA GENERATION
Last updated:
Abstract:
Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize/train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.
Status:
Application
Type:
Utility
Filling date:
24 Sep 2020
Issue date:
3 Mar 2022