In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the e...
We are concerned with the use of Mobile Agents for information retrieval. A multi-agent system is considered; a number of agents are involved in a collective effort to retrieve di...
Paraphrasing van Rijsbergen [37], the time is ripe for another attempt at using natural language processing (NLP) for information retrieval (IR). This paper introduces my disserta...
Cross-language latent semantic indexing is a method that learns useful languageindependent vector representations of terms through a statistical analysis of a documentaligned text...