Search Sciweavers | Sciweavers

27

WWW
2007
ACM

162views Internet Technology» more WWW 2007»

Detecting near-duplicates for web crawling

14 years 10 months ago

Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...

Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma

claim paper

Read More »

26

click to vote

WWW
2005
ACM

228views Internet Technology» more WWW 2005»

Focused crawling by exploiting anchor text using decision tree

14 years 10 months ago

Download www.www2005.org

Focused crawlers are considered as a promising way to tackle the scalability problem of topic-oriented or personalized search engines. To design a focused crawler, the choice of s...

Jun Li, Kazutaka Furuse, Kazunori Yamaguchi

claim paper

Read More »

20

click to vote

WWW
2008
ACM

103views Internet Technology» more WWW 2008»

Low-load server crawler: design and evaluation

14 years 10 months ago

Download www2008.org

This paper proposes a method of crawling Web servers connected to the Internet without imposing a high processing load. We are using the crawler for a field survey of the digital ...

Katsuko T. Nakahira, Tetsuya Hoshino, Yoshiki Mika...

claim paper

Read More »

22

click to vote

WWW
2004
ACM

106views Internet Technology» more WWW 2004»

Distributed community crawling

14 years 10 months ago

Download www.iw3c2.org

The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the...

Fabrizio Costa, Paolo Frasconi

claim paper

Read More »

38

click to vote

CORR
2012
Springer

292views Education» more CORR 2012»

Optimal Threshold Control by the Robots of Web Search Engines with Obsolescence of Documents

12 years 5 months ago

Download www-sop.inria.fr

A typical web search engine consists of three principal parts: crawling engine, indexing engine, and searching engine. The present work aims to optimize the performance of the cra...

Konstantin Avrachenkov, Alexander N. Dudin, Valent...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers