Abstract. Even a relatively unstructured captioned image set depicting a variety of objects in cluttered scenes contains strong correlations between caption words and repeated visu...
The appearance of dynamic scenes is often largely governed by a latent low-dimensional dynamic process. We show how to learn a mapping from video frames to this lowdimensional rep...
This paper proposes a novel image modeling scheme for object detection and localization. Object appearance is modeled by the joint distribution of k-tuple salient point feature ve...
Xiang Sean Zhou, Baback Moghaddam, Thomas S. Huang
We present a framework for learning object representations for fast recognition of a large number of different objects. Rather than learning and storing feature representations s...
People as news subjects carry rich semantics in broadcast news video and therefore finding a named person in the video is a major challenge for video retrieval. This task can be ac...
Jun Yang 0003, Ming-yu Chen, Alexander G. Hauptman...