In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the fa...
Panagiotis Papadimitriou 0002, Ali Dasdan, Hector ...
We investigate the effect of search engine brand (i.e., the identifying name or logo that distinguishes a product from its competitors) on evaluation of system performance. This r...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...