Abstract— Given an unstructured collection of captioned images of cluttered scenes featuring a variety of objects, our goal is to simultaneously learn the names and appearances o...
Michael Jamieson, Afsaneh Fazly, Suzanne Stevenson...
In this paper, we propose a method to rank the highlights of broadcast racquet sports videos. Compared with previous work, we integrate relevance feedback into highlight ranking f...
—Inexpensive RGB-D cameras that give an RGB image together with depth data have become widely available. We use this data to build 3D point clouds of a full scene. In this paper,...
Hema Swetha Koppula, Abhishek Anand, Thorsten Joac...
Psychologists have proposed that many human-object interaction activities form unique classes of scenes. Recognizing these scenes is important for many social functions. To enable...
Based on perceptual and computational attention modeling studies, we formulate measures of saliency for an audiovisual stream. Audio saliency is captured by signal modulations and...