Search Sciweavers | Sciweavers

472 search results - page 5 / 95

» Crawling the Hidden Web

171

click to vote

IC
2004

82views Applied Computing» more IC 2004»

IPMicra: An IP-address based Location Aware Distributed Web Crawler

15 years 8 months ago

Download www.l3s.de

Distributed crawling is able to overcome important limitations of the traditional single-sourced web crawling systems. However, the optimal benefit of distributed crawling is usual...

Odysseas Papapetrou, George Samaras

claim paper

Read More »

214

Voted

ICDM
2008
IEEE

186views Data Mining» more ICDM 2008»

xCrawl: A High-Recall Crawling Method for Web Mining

16 years 1 months ago

Download ls13-www.cs.uni-dortmund.de

Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The ﬁrst step in the Information Extract...

Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...

claim paper

Read More »

172

click to vote

COOPIS
2004
IEEE

108views Information Technology» more COOPIS 2004»

Minimizing the Network Distance in Distributed Web Crawling

15 years 10 months ago

Download softsys.cs.uoi.gr

Abstract. Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed cra...

Odysseas Papapetrou, George Samaras

claim paper

Read More »

179

Voted

WWW
2006
ACM

138views Internet Technology» more WWW 2006»

Geographically focused collaborative crawling

16 years 7 months ago

Download www2006.org

A collaborative crawler is a group of crawling nodes, in which each crawling node is responsible for a specific portion of the web. We study the problem of collecting geographical...

Weizheng Gao, Hyun Chul Lee, Yingbo Miao

claim paper

Read More »

156

click to vote

DMKD
2004
ACM

121views Data Mining» more DMKD 2004»

Discovery of ads web hosts through traffic data analysis

15 years 10 months ago

Download making.csie.ndhu.edu.tw

One of the most actual problems on web crawling

V. Bacarella, Fosca Giannotti, Mirco Nanni, Dino P...