Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
Answer patterns have been shown to improve the performance of open-domain factoid QA systems. Their use, however, requires either constructing the patterns manually or developing ...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
This paper investigates the level of metadata accuracy required for image filters to be valuable to users. Access to large digital image and video collections is hampered by ambig...
Optimizing and focusing search and results ranking in P2P networks becomes more and more important with the increasing size of these networks. Even though a few approaches have al...
Paul-Alexandru Chirita, Wolfgang Nejdl, Oana Scurt...
It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries1 due to that real queries might be short. The purpose of this...
We describe a method to define and use subwebs, user-defined neighborhoods of the Internet. Subwebs help improve search performance by inducing a topic-specific page relevance ...
Raman Chandrasekar, Harr Chen, Simon Corston-Olive...
We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called “GaP” for Gamma-Poisson, the distri...