Image classification and annotation are important problems
in computer vision, but rarely considered together. Intuitively,
annotations provide evidence for the class label,
and the class label provides evidence for annotations. For
example, an image of class highway is more likely annotated
with words “road,” “car,” and “traffic” than words
“fish,” “boat,” and “scuba.” In this paper, we develop a
new probabilistic model for jointly modeling the image, its
class label, and its annotations. Our model treats the class
label as a global description of the image, and treats annotation
terms as local descriptions of parts of the image.
Its underlying probabilistic assumptions naturally integrate
these two sources of information. We derive an approximate
inference and estimation algorithms based on variational
methods, as well as efficient approximations for classifying
and annotating new images. We examine the performance
of our model on two real-worl...
Chong Wang, David M. Blei, Fei-Fei Li