Sciweavers

299 search results - page 14 / 60
» User-centric Web crawling
Sort
View
KDD
2002
ACM
115views Data Mining» more  KDD 2002»
14 years 9 months ago
Collaborative crawling: mining user experiences for topical resource discovery
The rapid growth of the world wide web had made the problem of topic speci c resource discovery an important one in recent years. In this problem, it is desired to nd web pages wh...
Charu C. Aggarwal
WWW
2003
ACM
14 years 9 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
IR
2008
13 years 8 months ago
Focused web crawling in the acquisition of comparable corpora
CLIR resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this p...
Tuomas Talvensaari, Ari Pirkola, Kalervo Järv...
CIKM
2010
Springer
13 years 5 months ago
Crawling the web for structured documents
Structured Information Retrieval is gaining a lot of interest in recent years, as this kind of information is becoming an invaluable asset for professional communities such as Sof...
Julián Urbano, Juan Loréns, Yorgos A...
HICSS
1999
IEEE
178views Biometrics» more  HICSS 1999»
14 years 27 days ago
Collaborative Web Crawling: Information Gathering/Processing over Internet
The main objective of the IBM Grand Central Station (GCS) is to gather information of virtually any type of formats (text, data, image, graphics, audio, video) from the cyberspace...
Shang-Hua Teng, Qi Lu, Matthias Eichstaedt, Daniel...