In state-of-the-art image retrieval systems, an image is
represented by a bag of visual words obtained by quantizing
high-dimensional local image descriptors, and scalable
schemes inspired by text retrieval are then applied for large
scale image indexing and retrieval. Bag-of-words representations,
however: 1) reduce the discriminative power of
image features due to feature quantization; and 2) ignore
geometric relationships among visual words. Exploiting
such geometric constraints, by estimating a 2D affine transformation
between a query image and each candidate image,
has been shown to greatly improve retrieval precision
but at high computational cost. In this paper we present
a novel scheme where image features are bundled into local
groups. Each group of bundled features becomes much
more discriminative than a single feature, and within each
group simple and robust geometric constraints can be efficiently
enforced. Experiments in web image search, with a
database o...