Search Sciweavers | Sciweavers

178 search results - page 4 / 36

» Scheduling Algorithms for Web Crawling

222

click to vote

WIDM
2004
ACM

156views Internet Technology» more WIDM 2004»

Probabilistic models for focused web crawling

16 years 2 days ago

Download users.cs.dal.ca

A Focused crawler must use information gleaned from previously crawled page sequences to estimate the relevance of a newly seen URL. Therefore, good performance depends on powerfu...

Hongyu Liu, Evangelos E. Milios, Jeannette Janssen

claim paper

Read More »

161

click to vote

WWW
2008
ACM

103views Internet Technology» more WWW 2008»

Low-load server crawler: design and evaluation

16 years 7 months ago

Download www2008.org

This paper proposes a method of crawling Web servers connected to the Internet without imposing a high processing load. We are using the crawler for a field survey of the digital ...

Katsuko T. Nakahira, Tetsuya Hoshino, Yoshiki Mika...

claim paper

Read More »

191

click to vote

PVLDB
2008

124views more PVLDB 2008»

Google's Deep Web crawl

15 years 6 months ago

Download www.cs.cornell.edu

The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structu...

Jayant Madhavan, David Ko, Lucja Kot, Vignesh Gana...

claim paper

Read More »

160

click to vote

WWW
2006
ACM

139views Internet Technology» more WWW 2006»

Do not crawl in the DUST: different URLs with similar text

16 years 18 days ago

Download www2007.org

We consider the problem of dust: Diﬀerent URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...

Uri Schonfeld, Ziv Bar-Yossef, Idit Keidar

claim paper

Read More »

165

click to vote

WWW
2006
ACM

96views Internet Technology» more WWW 2006»

What's really new on the web?: identifying new pages from a series of unstable web snapshots

16 years 7 months ago

Download www.tkl.iis.u-tokyo.ac.jp

Identifying and tracking new information on the Web is important in sociology, marketing, and survey research, since new trends might be apparent in the new information. Such chan...

Masashi Toyoda, Masaru Kitsuregawa

claim paper

Read More »

« Prev « First page 4 / 36 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers