Collaborative Web Crawling: Information Gathering/Processing over Internet

15 years 6 months ago

Download www.almaden.ibm.com

The main objective of the IBM Grand Central Station (GCS) is to gather information of virtually any type of formats (text, data, image, graphics, audio, video) from the cyberspace, to process/index/summarize the information, and to push the right information to the right people. Because of the very large scale of the cyberspace, parallel processing in both crawling/gathering and information processing is indispensable. In this paper, we present a scalable method for collaborative web crawling and information processing. The method includes an automatic cyberspace partitioner which is designed to dynamically balance and re-balance the load among processors. It can be can be used when all web crawlers are located on a tightly coupled high-performance system as well as when they are scattered in a distributed environment. We have implemented our algorithms in Java.

Shang-Hua Teng, Qi Lu, Matthias Eichstaedt, Daniel

Real-time Traffic

Automatic Cyberspace Partitioner | Biometrics | Grand Central Station | HICSS 1999 | Information Processing |

claim paper

» Question Answering over Implicitly Structured Web Content

» AgentBased Approach for Web Crawling

» Detecting nearduplicates for web crawling

» Sitemaps above and beyond the crawl of duty

» Usercentric Web crawling

» Cooperative Crawling

» Efficient URL caching for world wide web crawling

» An Active Ontologydriven Network Service for Internet Collaboration

Post Info
More Details (n/a)

Added	03 Aug 2010
Updated	03 Aug 2010
Type	Conference
Year	1999
Where	HICSS
Authors	Shang-Hua Teng, Qi Lu, Matthias Eichstaedt, Daniel Alexander Ford, Tobin J. Lehman

Comments (0)

Sciweavers

Collaborative Web Crawling: Information Gathering/Processing over Internet

Automatic Cyberspace Partitioner | Biometrics | Grand Central Station | HICSS 1999 | Information Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers