Sciweavers

299 search results - page 12 / 60
» User-centric Web crawling
Sort
View
WWW
2006
ACM
14 years 8 months ago
What's really new on the web?: identifying new pages from a series of unstable web snapshots
Identifying and tracking new information on the Web is important in sociology, marketing, and survey research, since new trends might be apparent in the new information. Such chan...
Masashi Toyoda, Masaru Kitsuregawa
WWW
2008
ACM
14 years 8 months ago
Low-load server crawler: design and evaluation
This paper proposes a method of crawling Web servers connected to the Internet without imposing a high processing load. We are using the crawler for a field survey of the digital ...
Katsuko T. Nakahira, Tetsuya Hoshino, Yoshiki Mika...
WWW
2007
ACM
14 years 8 months ago
Crawling multiple UDDI business registries
As Web services proliferate, size and magnitude of UDDI Business Registries (UBRs) are likely to increase. The ability to discover Web services of interest then across multiple UB...
Eyhab Al-Masri, Qusay H. Mahmoud
WWW
2006
ACM
14 years 1 months ago
Do not crawl in the DUST: different URLs with similar text
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar