Eye gaze and gesture form key conversational grounding cues that are used extensively in face-to-face interaction among people. To accurately recognize visual feedback during inter...
Recent content-based video retrieval systems combine output of concept detectors (also known as high-level features) with text obtained through automatic speech recognition. This ...
Robin Aly, Djoerd Hiemstra, Arjen P. de Vries, Fra...
Abstract. It is difficult to track, parse and model human-computer interactions during editing and revising of documents, but it is necessary if we are to develop automated technol...
– The main task of a voice-enabled tour-guide robot in mass exhibition setting is to engage visitors in dialogue and provide as much exhibit information as possible in a limited ...
By learning a range of possible times over which the effect of an action can take place, a robot can reason more effectively about causal and contingent relationships in the world...