Similarity search leveraging distance-based index structures is increasingly being used for complex data types. It has been shown that for high dimensional uniform vectors with similarity norms, any clustering and partitioning index method is outperformed by sequential scan. However, intrinsic clustering of real data usually leads to low intrinsic dimensionality. MoBIoS (the Molecular Biological Information System) is a next generation database management system comprising distance-based indices. Owing to its generality, we have built, evaluated and optimized a prototype of a distance-based image retrieval system. We show that under a metric distance function, image data is intrinsically low dimensional. We investigate the performance of three distance-based index structures ( M-tree, RBT-index, and MVP-index), and, to optimize the construction of MVP-indexes, develop new heuristics that seek centers as pivots and partition the data according to its intrinsic clustering. Last, we show...
Rui Mao, Wenguo Liu, Daniel P. Miranker, Qasim Iqb