Cisco Systems, Inc.
Positional audio metadata generation

Last updated:

Abstract:

At a video conference endpoint including a camera, a microphone array, and one or more microphone assemblies, the video conference endpoint may divide a video output of the camera into one or more tracking sectors and detect a head position for each participant in the video output. The video conference endpoint may determine within which tracking sector each detected head position is located. The video conference endpoint may determine active sound source positions of the actively speaking participants based on sound being detected or captured by the microphone array and microphone assemblies, and may determine within which tracking sector the active sound source positions are located. For each tracking sector that contains an active sound source position, the video conference endpoint may update the positional audio metadata for that particular tracking sector based on the active sound source positions and the detected head positions located in that tracking sector.

Status:
Grant
Type:

Utility

Filling date:

14 Dec 2020

Issue date:

7 Sep 2021