This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis ove...
We consider the setting of a web server that receives requests for documents from clients, and returns the requested documents over a multicast/broadcast channel. We compare the q...
:The eXtensible Markup Language – XML – is not only a language for communication between humans and the web, it is also a language for communication between programs. Rather th...
We present an incremental algorithm for building a neighborhood graph from a set of documents. This algorithm is based on a population of artificial agents that imitate the way re...
This paper considers a tree-rewriting framework for modeling documents evolving through service calls. We focus on the automatic verification of properties of documents that may c...