We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hier...
Today customers want to use powerful search engines for their huge and increasing content repositories. Full-text-only products with simple result lists are not enough to satisfy t...
Electronic mail poses a number of unusual challenges for the design of information retrieval systems and test collections, including informal expression, conversational structure,...
An important requirement for emerging applications which aim to locate and integrate content distributed over the Web is to identify pages that are relevant for a given domain or ...