Untethered multimodal interfaces are more attractive than tethered ones because they are more natural and expressive for interaction. Such interfaces usually require robust vision...
My thesis aims to contribute towards building autonomous agents that are able to understand their surrounding environment through the use of both audio and visual information. To ...
Disability of visual text reading has a huge impact on the quality of life for visually disabled people. One of the most anticipated devices is a wearable camera capable of findi...
A robot’s ability to assist humans in a variety of tasks, e.g. in search and rescue or in a household, heavily depends on the robot’s reliable recognition of the objects in th...
In this paper, we present an approach for speaker change detection in broadcast video using joint audio-visual scene change statistics. Our experiments indicate that using joint a...