Sciweavers

ICASSP
2008
IEEE

Speaker indexing and speech enhancement in real meetings / conversations

14 years 6 months ago
Speaker indexing and speech enhancement in real meetings / conversations
This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.
Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka,
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Where ICASSP
Authors Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino
Comments (0)