This paper presents a new system for recognition, tracking and pose estimation of people in video sequences. It is based on the wavelet transform from the upper body part and uses ...
Philipp Zehnder, Esther Koller-Meier, Luc J. Van G...
We describe the ICSI-SRI-UW team’s entry in the Spring 2004 NIST Meeting Recognition Evaluation. The system was derived from SRI’s 5xRT Conversational Telephone Speech (CTS) r...
Chuck Wooters, Nikki Mirghafori, Andreas Stolcke, ...
A polyglot text-to-speech synthesis system which is able to read aloud mixed-lingual text has first of all to derive the correct pronunciation. This is achieved with an accurate m...
Abstract. This paper presents a framework for corpus based multimodal research. Part of this framework is applied in the context of meeting modelling. A generic model for differen...
This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and ...
Andrei Popescu-Belis, Alexander Clark, Maria Georg...
Combining multiple information sources, typically from several data streams is a very promising approach, both in experiments and to some extend in various real-life applications. ...
Video document retrieval is now an active part of the domain of multimedia retrieval. However, unlike for other media, the management of a collection of video documents adds the pr...
We 1 present a method for face detection which uses a new SVM structure trained in an expert manner in the eigenface space. This robust method has been introduced as a post process...
As the amount of multimodal meetings data being recorded increases, so does the need for sophisticated mechanisms for accessing this data. This process is complicated by the diffe...