In this paper we report on the acquisition and content of a new database intended for developing audio-visual speech recognition systems. This database supports a speaker dependen...
Abstract. Emerging electronic text formats include hierarchical structure and visualization related information that current Text-to-Speech (TtS) systems ignore. In this paper we p...
We propose a framework for modeling, analysis, annotation and synthesis of multi-modal dance performances. We analyze correlations between music features and dance figure labels ...
Ferda Ofli, Engin Erzin, Yucel Yemez, A. Murat Tek...
While it is clear that the full emotional effect of a movie scene is carried through the successful interpretation of audio and visual information, music still carries a significa...
Aida Austin, Elliot Moore II, Udit Gupta, Parag Ch...
In this paper we address the problem of estimating who is speaking from automatically extracted low resolution visual cues in group meetings. Traditionally, the task of speech/non...