The rapid growth of the World Wide Web and the Internet has fueled interest in Web services and the Semantic Web, which are quickly becoming important parts of modern electronic commerce systems. An interesting segment of the Web services domain are the facilities for document manipulation including Web search, information monitoring, data extraction, and page comparison. These services are built on common functional components that can preprocess large numbers of Web pages, parsing them into internal storage and processing formats. If a Web service is to operate on the scale of the Web, it must handle this storage and processing efficiently. In this paper, we introduce Page Digest, a mechanism for efficient storage and processing of Web documents. The Page Digest design encourages a clean separation of the structural elements of Web documents from their content. Its encoding transformation produces many of the advantages of traditional string digest schemes yet remains invertible w...