State-of-the-art speaker diarization systems for meetings are now at a point where overlapped speech contributes significantly to the errors made by the system. However, little if no work has yet been done on detecting overlapped speech. We present our initial work toward developing an overlap detection system for improved meeting diarization. We investigate various features, with a focus on high-precision performance for use in the detector, and examine performance results on a subset of the AMI Meeting Corpus. For the high-quality signal case of a single mixed-headset channel signal, we demonstrate a relative improvement of about 7.4% DER over the baseline diarization system, while for the more challenging case of the single far-field channel signal relative improvement is 3.6%. We also outline steps towards improvement and moving beyond this initial phase.
Kofi Boakye, B. Trueba-Hornero, Oriol Vinyals, Ger