Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
Privacy is an enormous problem in online social networking sites. While sites such as Facebook allow users fine-grained control over who can see their profiles, it is difficult ...
Web search engines are traditionally evaluated in terms of the relevance of web pages to individual queries. However, relevance of web pages does not tell the complete picture, si...
The ability to pinpoint the geographic location of IP hosts is compelling for applications such as on-line advertising and network attack diagnosis. While prior methods can accurat...
Brian Eriksson, Paul Barford, Joel Sommers, Robert...
—Sampling is used as a universal method to reduce the running time of computations – the computation is performed on a much smaller sample and then the result is scaled to comp...