We introduce a text-based image feature and demon-
strate that it consistently improves performance on hard
object classification problems. The feature is built using
an auxiliary dataset of images annotated with tags, down-
loaded from the internet. We do not inspect or correct the
tags and expect that they are noisy. We obtain the text fea-
ture of an unannotated image from the tags of its k-nearest
neighbors in this auxiliary collection.
A visual classifier presented with an object viewed un-
der novel circumstances (say, a new viewing direction) must
rely on its visual examples. Our text feature may not change,
because the auxiliary dataset likely contains a similar pic-
ture. While the tags associated with images are noisy, they
are more stable when appearance changes.
We test the performance of this feature using PAS-
CAL VOC 2006 and 2007 datasets. Our feature performs
well, consistently improves the performance of visual ob-
ject classifiers, and is particularl...
David A. Forsyth, Derek Hoiem, Gang Wang