Abstract. This paper presents an automatic approach to mining collections of maps from the Web. Our method harvests images from the Web and then classifies them as maps or non-map...
HyperScout, a Web application, is an intermediary between a server and a client. It intercepts a page to the client, gathers information on each link, and annotates each link with...
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
This paper describes a decentralized peer-to-peer model for building a Web crawler. Most of the current systems use a centralized client-server model, in which the crawl is done by...
This paper presents our approach to inferring communities on the Web. It delineates the sub-culture hierarchies based on how individuals get involved in the dispersion of online o...