The Visual Thesaurus is a new query approach when no starting image is available. It is a concise representation of all similar regions in a panel of visual patches; the user arranges the visual patches according to his mental target image. The construction of the Visual Thesaurus needs a reliable region description and a clustering algorithm that reflects the variety of the database. In this paper, we develop a new region description schema based on Harris color points of interest. We also evaluate the relevance of several multi-dimensional matching metrics when measuring the similarity between regions described by variable signature dimensions. We outline the need of clustering to speed up the computation process as well. Moreover, we adopted the relational clustering algorithm to categorize regions according to Harris points of interest features. Generated clusters are represented by prototypes that compose the "page zero" of the Visual Thesaurus. We tested our approach o...