We present a new framework for characterizing and retrieving objects in cluttered scenes. This CBIR system is based on a new representation describing every object taking into account the local properties of its parts and their mutual spatial relations, without relying on accurate segmentation. For this purpose, a new multi-dimensional histogram is used that measures the joint distribution of local properties and relative spatial positions. Instead of using a single descriptor for all the image, we represent the image by a set of histograms covering the object from different perspectives. We integrate this representation in a whole framework which has two stages. The first one is to allow an efficient retrieval based on the geometric properties (shape) of objects in images with clutter. This is achieved by i) using a contextual descriptor that incorporates the distribution of local structures, and ii) taking a proper distance that disregards the clutter of the images. At a second st...