Sciweavers

611 search results - page 6 / 123
» Random web crawls
Sort
View
ITSSA
2006
581views more  ITSSA 2006»
13 years 7 months ago
Agent-Based Approach for Web Crawling
: Since its creation in 1990, World Wide Web has increased the popularity of Internet which becomes an important source of information or services for all people over the world. Th...
Maxime Wack, Mohamed Bakhouya, Jaafar Gaber
ICMCS
2009
IEEE
131views Multimedia» more  ICMCS 2009»
13 years 5 months ago
Web image mining using concept sensitive Markov stationary features
With the explosive growth of web resources, how to mine semantically relevant images efficiently becomes a challenging and necessary task. In this paper, we propose a concept sens...
Chunjie Zhang, Jing Liu, Hanqing Lu, Songde Ma
ADMA
2009
Springer
142views Data Mining» more  ADMA 2009»
14 years 2 months ago
Crawling Deep Web Using a New Set Covering Algorithm
Abstract. Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This ca...
Yan Wang, Jianguo Lu, Jessica Chen
WWW
2004
ACM
14 years 8 months ago
Distributed community crawling
The massive distribution of the crawling task can lead to inefficient exploration of the same portion of the Web. We propose a technique to guide crawlers exploration based on the...
Fabrizio Costa, Paolo Frasconi
WWW
2007
ACM
14 years 8 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma