We present a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this ...
Ye Wang, Min-Yen Kan, Tin Lay Nwe, Arun Shenoy, Ju...
Face-to-face meetings usually encompass several modalities including speech, gesture, handwriting, and person identification. Recognition and integration of each of these modaliti...
Michael Bett, Ralph Gross, Hua Yu, Xiaojin Zhu, Yu...
Conventional approaches to video annotation predominantly focus on supervised identification of a limited set of concepts, while unsupervised annotation with infinite vocabulary...
Emily Moxley, Tao Mei, Xian-Sheng Hua, Wei-Ying Ma...
In this paper, we present an approach for speaker change detection in broadcast video using joint audio-visual scene change statistics. Our experiments indicate that using joint a...
This paper investigates the automatic analysis and segmentation of meetings. A meeting is analysed in terms of individual behaviours and group interactions, in order to decompose e...