Sciweavers

TCSV
2008

Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization

13 years 11 months ago
Exploring Co-Occurence Between Speech and Body Movement for Audio-Guided Video Localization
This paper presents a bottom-up approach that combines audio and video to simultaneously locate individual speakers in the video (2-D source localization) and segment their speech (speaker diarization), in meetings recorded by a single stationary camera and a single microphone. The novelty lies in using motion information from the entire body rather than just the face to perform these tasks, which permits processing nonfrontal views unlike previous work. Since body-movements do not exhibit instantaneous signal-level synchrony with speech, the approach targets long term co-occurrences between audio and video subspaces. First, temporal clustering of the audio produces a large number of intermediate clusters, each containing speech from only a single speaker. Then, spatial clustering is performed in the video frames of each cluster by a novel eigen-analysis method to find the region of dominant motion. This region is associated with the speech assuming that a speaker exhibits more movemen...
H. Vajaria, S. Sarkar, R. Kasturi
Added 29 Dec 2010
Updated 29 Dec 2010
Type Journal
Year 2008
Where TCSV
Authors H. Vajaria, S. Sarkar, R. Kasturi
Comments (0)