Building robust low and mid-level image representations, beyond edge primitives, is a long-standing goal in vision. Many existing feature detectors spatially pool edge information...
Matthew Zeiler, Dilip Krishnan, Graham Taylor, Rob...
We present an automatic and efficient method to extract spatio-temporal human volumes from video, which combines top-down model-based and bottom-up appearancebased approaches. Fr...
Web photos in social media sharing websites such as Flickr are generally accompanied by rich but noisy textual descriptions (tags, captions, categories, etc.). In this paper, we p...
Tracking-by-detection is increasingly popular in order to tackle the visual tracking problem. Existing adaptive methods suffer from the drifting problem, since they rely on selfup...
Jakob Santner, Christian Leistner, Amir Saffari, T...
Recent work in object localization has shown that the use of contextual cues can greatly improve accuracy over models that use appearance features alone. Although many of these mo...
Carolina Galleguillos, Brian McFee, Gert Lanckriet
We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starti...
Acquiring and representing the 4D space of rays in the world (the light field) is important for many computer vision and graphics applications. Yet, light field acquisition is c...
Markov random fields with higher order potentials have emerged as a powerful model for several problems in computer vision. In order to facilitate their use, we propose a new rep...
In this paper we considerably improve on a state-of-theart alpha matting approach by incorporating a new prior which is based on the image formation process. In particular, we mod...
Christoph Rhemann, Carsten Rother, Pushmeet Kohli,...
We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new...