We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, ...
Marc Al-Hames, Alfred Dielmann, Daniel Gatica-Pere...
Biomedical images are invaluable in establishing diagnosis, acquiring technical skills, and implementing best practices in many areas of medicine. At present, images needed for in...
Daekeun You, Sameer Antani, Dina Demner-Fushman, M...
Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation. However, these language models can be difficult to u...
Efficient multimodal fusion is a key feature of future video indexing systems. Hidden Markov Models provide a powerful framework for video structure analysis but they require all...
This paper describes the development of a rule-based computational model that describes how a feature-based representation of shared visual information combines with linguistic cu...