Sciweavers

611 search results - page 12 / 123
» Random web crawls
Sort
View
WWW
2007
ACM
14 years 8 months ago
Crawling multiple UDDI business registries
As Web services proliferate, size and magnitude of UDDI Business Registries (UBRs) are likely to increase. The ability to discover Web services of interest then across multiple UB...
Eyhab Al-Masri, Qusay H. Mahmoud
WWW
2006
ACM
14 years 1 months ago
Do not crawl in the DUST: different URLs with similar text
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar
PVLDB
2008
124views more  PVLDB 2008»
13 years 7 months ago
Google's Deep Web crawl
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structu...
Jayant Madhavan, David Ko, Lucja Kot, Vignesh Gana...
SIGIR
2006
ACM
14 years 1 months ago
AggregateRank: bringing order to web sites
Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web sear...
Guang Feng, Tie-Yan Liu, Ying Wang, Ying Bao, Zhim...
SIGIR
2003
ACM
14 years 29 days ago
Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web
This paper describes a decentralized peer-to-peer model for building a Web crawler. Most of the current systems use a centralized client-server model, in which the crawl is done by...
Aameek Singh, Mudhakar Srivatsa, Ling Liu, Todd Mi...