We demonstrate the merits of using inter-document similarities for federated search. Specifically, we study a resultsmerging method that utilizes information induced from cluster...
Given a large-scale linked document collection, such as a collection of blog posts or a research literature archive, there are two fundamental problems that have generated a lot o...
Query-independent features (also called document priors), such as the number of incoming links to a document, its Page-Rank, or the type of its associated URL, have been successfu...
Text classification is the process of classifying documents into predefined categories based on their content. Existing supervised learning algorithms to automatically classify te...
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...