This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common ...
We show how the full XPath language can be compiled into a minimal subset suited for stream-based evaluation. Specifically, we show how XPath normalization into a core language a...
Recent developments of document technologies have strongly impacted the evolution of Web clients over the last fifteen years, but all Web clients have not taken the same advantag...
In this paper, we present a method for structuring a document according to the information present in its Table of Contents. The detection of the ToC as well as the determination ...
Fully automatic machine translation cannot produce high quality translation; Dialog-Based Machine Translation (DBMT) is the only way to provide authors with a means of translating...
Structured document content reuse is the problem of restructuring and translating data structured under a source schema into an instance of a target schema. A notion closely tied ...