This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algor...
Despite significant research on state-space reductions, the poor scalability of model checking for reasoning about behavioral models of large, complex systems remains the chief ob...
Matthew B. Dwyer, Robby, Xianghua Deng, John Hatcl...
Abstract. The Web provides us with a vast resource for business intelligence. However, the large size of the Web and its dynamic nature make the task of foraging appropriate inform...
In the so-called Information-Explosion Era, astronomical amount of information is ubiquitously produced and digitally stored. It is getting more and more convenient for cooperative...