This work provides a framework for learning sequential attention in real-world visual object recognition, using an architecture of three processing stages. The first stage rejects...
Classifying an event captured in an image is useful for understanding the contents of the image. The captured event provides context to refine models for the presence and appearan...
We present a method to learn object class models from unlabeled and unsegmented cluttered scenes for the purpose of visual object recognition. We focus on a particular type of mode...
In this paper we propose a novel approach to introducing semantic relations into the bag-of-words framework. We use the latent semantic models, such as LSA and pLSA, in order to d...
In the real visual world, the number of categories a classifier needs to discriminate is on the order of hundreds or thousands. For example, the SUN dataset [24] contains 899 sce...