A spoken language generation system has been developed that learns to describe objects in computer-generated visual scenes. The system is trained by a `show-and-tell' procedu...
A visual word lexicon can be constructed by clustering primitive visual features, and a visual object can be described by a set of visual words. Such a "bag-of-words" re...
—This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The ...
Pei Yin, Antonio Criminisi, John M. Winn, Irfan A....
Abstract. In this paper, several important issues related to visual motion analysis are addressed with a focus on the type of motion information to be estimated and the way context...
Abstract. The use of sparse invariant features to recognise classes of actions or objects has become common in the literature. However, features are often "engineered" to...