Meta Platforms, Inc.
Voice Separation with An Unknown Number of Multiple Speakers

Last updated:

Abstract:

In one embodiment, a method includes receiving a mixed audio signal comprising a mixture of voice signals associated with a plurality of speakers, generating first audio signals by processing the mixed audio signal using a first machine-learning model configured with a first number of output channels, determining that at least one of the first number of output channels is silent based on the first audio signals, generating second audio signals by processing the mixed audio signal using a second machine-learning model configured with a second number of output channels that is fewer than the first number of output channels, determining that each of the second number of output channels is non-silent based on the second audio signals, and using the second machine-learning model to separate additional mixed audio signals associated with the plurality of speakers.

Status:
Application
Type:

Utility

Filling date:

20 Apr 2020

Issue date:

19 Aug 2021