A semi-structured information space consists of multiple collections of textual documents containing fielded or tagged sections. The space can be highly heterogeneous, because eac...
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...
Snippets are used by almost every text search engine to complement ranking scheme in order to effectively handle user searches, which are inherently ambiguous and whose relevance ...
Namespaces are a central building block of XML technologies today, they provide the identification mechanism for many XML-related vocabularies. Despite their ubiquity, there is no...
There has been a great amount of work on query-independent summarization of documents. However, due to the success of Web search engines query-specific document summarization (que...