As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. The goal of this work i...
Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
In this short note we present a recommendation system for automatic retrieval of broken Web links using an approach based on contextual information. We extract information from th...
In this paper we describe techniques for the discovery and construction of user profiles. Leveraging from the emergent data web, our system addresses the problem of sparseness of ...
We present BlogScope (www.blogscope.net), a system for analyzing the Blogosphere. BlogScope is an information discovery and text analysis system that offers a set of unique featur...