International Business Machines Corporation
Diarization driven by the ASR based segmentation

Last updated:

Abstract:

An approach is provided that receives an audio stream and utilizes a voice activation detection (VAD) process to create a digital audio stream of voices from at least two different speakers. An automatic speech recognition (ASR) process is applied to the digital stream with the ASR process resulting in the spoken words to which a speaker turn detection (STD) process is applied to identify a number of speaker segments with each speaker segment ending at a word boundary. A speaker clustering algorithm is then applied to the speaker segments to associate one of the speakers with each of the speaker segments.

Status:
Grant
Type:

Utility

Filling date:

21 Nov 2017

Issue date:

14 Sep 2021