The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different...
Eytan Adar, Jaime Teevan, Susan T. Dumais, Jonatha...
: We initiate the study of local, sublinear time algorithms for finding vertices with extreme topological properties -- such as high degree or clustering coefficient -- in large so...
The web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from thes...
Abstract. Distributed crawling has shown that it can overcome important limitations of the centralized crawling paradigm. However, the distributed nature of current distributed cra...
There is a great amount of information on the web that can not be accessed by conventional crawler engines. This portion of the web is usually called hidden web data. To be able t...