We introduce the first visual dataset of fast foods with a total of 4,545 still images, 606 stereo pairs, 303 3600 videos for structure from motion, and 27 privacy-preserving vide...
This paper addresses the problem of probabilistic recognition of activities from local spatio-temporal appearance. Joint statistics of space-time filters are employed to define hi...
Untethered multimodal interfaces are more attractive than tethered ones because they are more natural and expressive for interaction. Such interfaces usually require robust vision...
In this paper, we develop a novel method for view-based recognition of human action/activity from videos. By observing just a few frames, we can identify the activity that takes pl...
Long-duration tracking of general targets is quite challenging for computer vision, because in practice target may undergo large uncertainties in its visual appearance and the unc...