HyperScout, a Web application, is an intermediary between a server and a client. It intercepts a page to the client, gathers information on each link, and annotates each link with...
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
This paper describes a decentralized peer-to-peer model for building a Web crawler. Most of the current systems use a centralized client-server model, in which the crawl is done by...
This paper presents our approach to inferring communities on the Web. It delineates the sub-culture hierarchies based on how individuals get involved in the dispersion of online o...
Recently, there have been a number of algorithms proposed for analyzing hypertext link structure so as to determine the best "authorities" for a given topic or query. Wh...
Allan Borodin, Gareth O. Roberts, Jeffrey S. Rosen...