Sciweavers

102 search results - page 16 / 21
» Agent-Based Approach for Web Crawling
Sort
View
WWW
2009
ACM
14 years 8 months ago
Triplify: light-weight linked data publication from relational databases
In this paper we present Triplify ? a simplistic but effective approach to publish Linked Data from relational databases. Triplify is based on mapping HTTP-URI requests onto relat...
Sören Auer, Sebastian Dietzold, Jens Lehmann,...
SIGIR
2005
ACM
14 years 1 months ago
Server selection methods in hybrid portal search
The TREC .GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with ...
David Hawking, Paul Thomas
ECIR
2006
Springer
13 years 9 months ago
Automatic Document Organization in a P2P Environment
Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We co...
Stefan Siersdorfer, Sergej Sizov
CIKM
2009
Springer
14 years 2 months ago
Graph-based seed selection for web-scale crawlers
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
Shuyi Zheng, Pavel Dmitriev, C. Lee Giles
KDD
2008
ACM
183views Data Mining» more  KDD 2008»
14 years 8 months ago
De-duping URLs via rewrite rules
A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
Anirban Dasgupta, Ravi Kumar, Amit Sasturkar