Our study of a large set of scientific applications over the past three years indicates that the processing for multidimensional datasets is often highly stylized. The basic proce...
Chialin Chang, Renato Ferreira, Alan Sussman, Joel...
Automatic text classification is an important operational problem in digital library practice. Most text classification efforts so far concentrated on developing centralized solut...
We describe a system for rapidly determining document similarity among a set of documents obtained from an information retrieval (IR) system. We obtain a ranked list of the most i...
The nDCG measure has proven to be a popular measure of retrieval effectiveness utilizing graded relevance judgments. However, a number of different instantiations of nDCG exist, d...
In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. Our approach is based on Locality Sensitive Hashing (LSH...