

Visual speaker localization aided by acoustic models

14 years 8 months ago
Visual speaker localization aided by acoustic models
The following paper presents a novel audio-visual approach for unsupervised speaker locationing. Using recordings from a single, low-resolution room overview camera and a single far-field microphone, a state-of-the art audio-only speaker localization system (traditionally called speaker diarization) is extended so that both acoustic and visual models are estimated as part of a joint unsupervised optimization problem. The speaker diarization system first automatically determines the number of speakers and estimates “who spoke when”, then, in a second step, the visual models are used to infer the location of the speakers in the video. The experiments were performed on real-world meetings using 4.5 hours of the publicly available AMI meeting corpus. The proposed system is able to exploit audio-visual integration to not only improve the accuracy of a state-of-the-art (audioonly) speaker diarization, but also adds visual speaker locationing at little incremental engineering and compu...
Gerald Friedland, Chuohao Yeo, Hayley Hung
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Where MM
Authors Gerald Friedland, Chuohao Yeo, Hayley Hung
Comments (0)