The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
In this paper, we extend previous work on the automatic structuring of medical documents using content analysis. Our long-term objective is to take advantage of specific rhetoric ...
Gersende Georg, Hugo Hernault, Marc Cavazza, Helmu...
This paper describes the Differential Synchronization (DS) method for keeping documents synchronized. The key feature of DS is that it is simple and well suited for use in both no...
The representation of information collections needs to be optimized for human cognition. While documents often include rich visual components, collections, including personal coll...
In this paper we propose a novel approach to markup, called Extreme Annotational RDF Markup (EARMARK), using RDF and OWL to annotate features in text content that cannot be mapped...
In this article, we introduce a new problem: the construction of multi-structured documents. We first offer an overview of existing solutions to the representation of such docum...
We study the analysis problem of XPath expressions with counting constraints. Such expressions are commonly used in document transformations or programs in which they select porti...