abstraction for modeling these problems is to view the Web as a collection of (usually small and heterogeneous) databases, and to view programs that extract and process Web data au...
Summarization of text documents is increasingly important with the amount of data available on the Internet. The large majority of current approaches view documents as linear sequ...
We present the current version of STRIDER1 , a versatile system for the disambiguation of structure-based information like XML schemas, structures of XML documents and web director...
We study the problem of creating highly compressed fulltext index structures for versioned document collections, that is, collections that contain multiple versions of each docume...
Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...