The use of visual information derived from accurate lip extraction, can provide features invariant to noise perturbation for speech recognition systems and can be also used in a w...
An unsupervised discriminative training procedure is proposed for estimating a language model (LM) for machine translation (MT). An English-to-English synchronous context-free gra...
Zhifei Li, Ziyuan Wang, Sanjeev Khudanpur, Jason E...
We present the first user study of out-of-turn interaction in menu-based, interactive voice-response systems. Out-ofturn interaction is a technique which empowers the user (unable...
Saverio Perugini, Taylor J. Anderson, William F. M...
Recognizing speech, gestures, and visual features are important interface capabilities for embedded mobile systems. Perception algorithms have many traits in common with more conv...
The Informedia Digital Video Library system extracts information from digitized video sources and allows full content search and retrieval over all extracted data. This extracted ...
Howard D. Wactlar, Alexander G. Hauptmann, Michael...