In this paper we present a clustering and indexing paradigm called Clindex for high-dimensional search spaces. The scheme is designed for approximate similarity searches, where one wishes to nd many of the data points near a target point, but where one can tolerate missing a few near points. For such searches, our scheme can nd near points with high recall in very few IOs and perform signi cantly better than other approaches. Our scheme is based on nding clusters, and then 1
Chen Li, Edward Y. Chang, Hector Garcia-Molina, Gi