The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the c...
Delta compression techniques are commonly used to succinctly represent an updated version of a file with respect to an earlier one. In this paper, we study the use of delta compr...
Zan Ouyang, Nasir D. Memon, Torsten Suel, Dimitre ...
The TREC 2003 web track consisted of both a non-interactive stream and an interactive stream. Both streams worked with the .GOV test collection. The non-interactive stream continu...
Nick Craswell, David Hawking, Ross Wilkinson, Ming...
In this paper, we describe a system that can extract record structures from web pages with no direct human supervision. Records are commonly occurring HTML-embedded data tuples th...
This paper presents the design and user evaluation of SmartBack, a feature that complements the standard Back button by enabling users to jump directly to key pages in their navig...