Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

14 years 9 months ago

Download sail.usc.edu

We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identiﬁcation system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identiﬁcation and; (b) microphone array processing, for datasets with the signiﬁcant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.

Viktor Rozgic, Kyu Jeong Han, Panayiotis G. Georgi

Real-time Traffic

ISM 2008 | Microphone Array | Multimedia | Speaker Segmentation | Speaker Segmentation Algorithm |

claim paper

Post Info
More Details (n/a)

Added	31 May 2010
Updated	31 May 2010
Type	Conference
Year	2008
Where	ISM
Authors	Viktor Rozgic, Kyu Jeong Han, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Comments (0)

Sciweavers

Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

ISM 2008 | Microphone Array | Multimedia | Speaker Segmentation | Speaker Segmentation Algorithm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers