Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speec...
Mehmet Emre Sargin, Oya Aran, Alexey Karpov, Ferda...
We consider two scenarios of naming people in databases of news photos with captions: (i) finding faces of a single person, and (ii) assigning names to all faces. We combine an in...
Matthieu Guillaumin, Thomas Mensink, Jakob J. Verb...
In developing automated systems to recognize the emotional content of music, we are faced with a problem spanning two disparate domains: the space of human emotions and the acoust...
Erik M. Schmidt, Douglas Turnbull, Youngmoo E. Kim
Fitting statistical 2D and 3D shape models to images is necessary for a variety of tasks, such as video editing and face recognition. Much progress has been made on local fitting...
We investigate the problem of automatically labelling
faces of characters in TV or movie material with their
names, using only weak supervision from automaticallyaligned
subtitl...