In this paper we demonstrate that in an ideal Distributed Information Retrieval environment, taking the ability of each collection server to return relevant documents into account...
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
The TREC 2004 Terabyte Track evaluated information retrieval in largescale text collections, using a set of 25 million documents (426 GB). This paper gives an overview of our expe...
The goal of distributed information retrieval is to support effective searching over multiple document collections. For efficiency, queries should be routed to only those collectio...
While participating in the HARD track our first question was, what an IR-application should look like that takes into account preference meta-data from the user, without the need ...