Sciweavers

2877 search results - page 503 / 576
» Learn to weight terms in information retrieval using categor...
Sort
View
WWW
2009
ACM
14 years 8 months ago
User-centric content freshness metrics for search engines
In order to return relevant search results, a search engine must keep its local repository synchronized to the Web, but it is usually impossible to attain perfect freshness. Hence...
Ali Dasdan, Xinh Huynh
WWW
2008
ACM
14 years 8 months ago
A larger scale study of robots.txt
A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...
Santanu Kolay
WWW
2008
ACM
14 years 8 months ago
Web graph similarity for anomaly detection (poster)
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the fa...
Panagiotis Papadimitriou 0002, Ali Dasdan, Hector ...
WWW
2007
ACM
14 years 8 months ago
Brand awareness and the evaluation of search results
We investigate the effect of search engine brand (i.e., the identifying name or logo that distinguishes a product from its competitors) on evaluation of system performance. This r...
Bernard J. Jansen, Mimi Zhang, Ying Zhang
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma