We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search...
We consider the problem of sampling URLs uniformly at random from the Web. A tool for sampling URLs uniformly can be used to estimate various properties of Web pages, such as the ...
Monika Rauch Henzinger, Allan Heydon, Michael Mitz...
Many search engines and other web applications suggest auto-completions as the user types in a query. The suggestions are generated from hidden underlying databases, such as query...
Recent researchhas studied howto measurethe size of a searchengine, in terms of the number of pages indexed. In this paper, we consider a di erent measure for search engines, name...
Monika Rauch Henzinger, Allan Heydon, Michael Mitz...
In this paper we survey known results on algorithms, data structures, and some applications of random sampling from databases. We first discuss various reasons for sampling from d...