Clustering web search engine results for ambiguous keyword searches poses unique challenges. First, we show that one cannot readily import the frequency based feature ranking to c...
Subthreshold-seeking behavior occurs when the majority of the points that an algorithm samples have an evaluation less than some target threshold. We characterize sets of functions...
All pairs similarity search is the problem of finding all pairs of records that have a similarity score above the specified threshold. Many real-world systems like search engine...
In this paper, we present a new cost model for nearest neighbor search in high-dimensional data space. We first analyze different nearest neighbor algorithms, present a generaliza...
In this paper we propose a novel document retrieval model in which text queries are augmented with multi-dimensional taxonomy restrictions. These restrictions may be relaxed at a ...
Marcus Fontoura, Vanja Josifovski, Ravi Kumar, Chr...