This paper describes an approach to retrieve images containing specific objects, scenes or buildings. The image content is captured by a set of local features. More precisely, we use so-called invariant regions. These are features with shapes that self-adapt to the viewpoint. The physical parts on the object surface that they carve out is the same in all views, even though the extraction proceeds from a single view only. The surface patterns within the regions are then characterized by a feature vector of moment invariants. Invariance is under affine geometric deformations and scaled color bands with an offset added. This allows regions from different views to be matched efficiently. An indexing technique based on Vantage Point Tree organizes the feature vectors in such a way that a naive sequential search can be avoided. This results in sublinear computation times to retrieve images from a database. In order to get sufficient certainty about the correctness of the retrieved images, a...