We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
This work introduces a novel data mining scheme, spatial pyramid mining, to discover association rules at multiple resolutions in order to identify frequent spatial configuration...
Geography is becoming increasingly important in web search. Search engines can often return better results to users by analyzing features such as user location or geographic terms...
Qingqing Gan, Josh Attenberg, Alexander Markowetz,...
Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special p...
In order to increase retrieval precision, some new search engines provide manually verified answers to Frequently Asked Queries (FAQs). An underlying task is the identification of...