The saliency of regions or objects in an image can be significantly boosted if they recur in multiple images. Leveraging this idea, cosegmentation jointly segments common regions...
Gunhee Kim, Eric P. Xing, Li Fei-Fei, Takeo Kanade
We present a new framework in which image segmentation, figure/ground organization, and object detection all appear as the result of solving a single grouping problem. This frame...
Human-nameable visual “attributes” can benefit various recognition tasks. However, existing techniques restrict these properties to categorical labels (for example, a person ...
In this work, we propose to use attributes and parts for recognizing human actions in still images. We define action attributes as the verbs that describe the properties of human...
Bangpeng Yao, Xiaoye Jiang, Aditya Khosla, Andy La...
Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total...
Videos usually consist of activities involving interactions between multiple actors, sometimes referred to as complex activities. Recognition of such activities requires modeling ...
Utkarsh Gaur, Yingying Zhu, Bi Song, Amit Roy-Chow...
In this work we present a new crowd analysis algorithm powered by behavior priors that are learned on a large database of crowd videos gathered from the Internet. The algorithm wo...
Mikel Rodriguez, Josef Sivic, Ivan Laptev, Jean-Yv...
A class of techniques in computer vision and graphics is based on capturing multiple images of a scene under different illumination conditions. These techniques explore variations...
Interpreting an image as a function on a compact subset of the Euclidean plane, we get its scale-space by diffusion, spreading the image over the entire plane. This generates a 1-...
Many applications involve multiple-modalities such as text and images that describe the problem of interest. In order to leverage the information present in all the modalities, on...