We extend the constellation model to include heterogeneous parts which may represent either the appearance or the geometry of a region of the object. The parts and their spatial configuration are learnt simultaneously and automatically, without supervision, from cluttered images. We describe how this model can be employed for ranking the output of an image search engine when searching for object categories. It is shown that visual consistencies in the output images can be identified, and then used to rank the images according to their closeness to the visual object category. Although the proportion of good images may be small, the algorithm is designed to be robust and is capable of learning in either a totally unsupervised manner, or with a very limited amount of supervision. We demonstrate the method on image sets returned by Google's image search for a number of object categories including bottles, camels, cars, horses, tigers and zebras.