Alibaba Group Holding Limited
Methods, apparatuses, systems, devices, and computer-readable storage media for processing speech signals based on horizontal and pitch angles and distance of a sound source relative to a microphone array

Last updated:

Abstract:

The disclosed embodiments disclose methods, apparatuses, systems, devices and computer-readable storage media for processing speech signals. The method comprises: acquiring a real-time image by using an image capturing device, performing facial recognition by using the real-time image, and detecting a period during which a target user makes speech sounds based on a facial recognition result; locating a sound source in an audio signal received by a microphone array, and determining the orientation information of a sound source in the audio signal; and based on the period during which the target user in the real-time image makes the speech sounds and the orientation information of the sound source, performing a speech sound start and end point analysis to determine start and end time points of the speech sounds in the audio signal. The method for processing speech signals according to one embodiment can perform voice activity detection to the speech signal in noisy environments containing multiple sources of interference, thereby improving the anti-interference capability of the system.

Status:
Grant
Type:

Utility

Filling date:

28 Aug 2019

Issue date:

26 Jul 2022