We describe an approach to object and scene retrieval which searches for and localizes all the occurrences of a user outlined object in a video. The object is represented by a set...
There are a huge number of videos with text tags on the Web nowadays. In this paper, we propose a method of automatically extracting from Web videos video shots corresponding to s...
Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial...
In this paper we explore the interlink between temporally dense view-based object recognition and sparse image representations with local keypoints. The temporal component is an a...
The notion of local features in space-time has recently been proposed to capture and describe local events in video. When computing space-time descriptors, however, the result may...