Sciweavers

295 search results - page 10 / 59
» Web Crawling
Sort
View
WWW
2006
ACM
14 years 3 months ago
Do not crawl in the DUST: different URLs with similar text
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar
WWW
2001
ACM
14 years 10 months ago
Crawling the Hidden Web
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
Sriram Raghavan, Hector Garcia-Molina
SIGIR
2008
ACM
13 years 9 months ago
Compressed collections for simulated crawling
Collections are a fundamental tool for reproducible evaluation of information retrieval techniques. We describe a new method for distributing the document lengths and term counts ...
Alessio Orlandi, Sebastiano Vigna
WWW
2007
ACM
14 years 10 months ago
Crawling multiple UDDI business registries
As Web services proliferate, size and magnitude of UDDI Business Registries (UBRs) are likely to increase. The ability to discover Web services of interest then across multiple UB...
Eyhab Al-Masri, Qusay H. Mahmoud
WWW
2005
ACM
14 years 10 months ago
Crawling a country: better strategies than breadth-first for web page ordering
Ricardo A. Baeza-Yates, Carlos Castillo, Mauricio ...