Image auto-annotation is an important open problem in
computer vision. For this task we propose TagProp, a discriminatively
trained nearest neighbor model. Tags of test
images are predicted using a weighted nearest-neighbor
model to exploit labeled training images. Neighbor weights
are based on neighbor rank or distance. TagProp allows
the integration of metric learning by directly maximizing
the log-likelihood of the tag predictions in the training set.
In this manner, we can optimally combine a collection of
image similarity metrics that cover different aspects of image
content, such as local shape descriptors, or global
color histograms. We also introduce a word specific sigmoidal
modulation of the weighted neighbor tag predictions
to boost the recall of rare words. We investigate the performance
of different variants of our model and compare to existing
work. We present experimental results for three challenging
data sets. On all three, TagProp makes a marked
impro...