In countries such as China or Iran where Internet censorship is prevalent, users usually rely on proxies or anonymizers to freely access the web. The obvious difficulty with this a...
We study the problem of creating highly compressed fulltext index structures for versioned document collections, that is, collections that contain multiple versions of each docume...
The information on the web is growing at a very fast pace. In this ever-accumulating data, the volume of information represented in XML format is on the rise in recent times. An o...
Abhilash Gummadi, Jong P. Yoon, Biren Shah, Vijay ...
We describe Thresher, a system that lets non-technical users teach their browsers how to extract semantic web content from HTML documents on the World Wide Web. Users specify exam...
The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web e...
Ziv Bar-Yossef, Andrei Z. Broder, Ravi Kumar, Andr...