The development of collaborative environments that not only manage information and communication, but also support the actual work processes of organisations is very important. XML...
The Portable Document Format (PDF) is a page-oriented, graphically rich document format based on PostScript semantics. It is the file format underlying the Adobe
This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
Deriving a thematically meaningful partition of an unlabeled document corpus is a challenging task. In this context, the use of document representations based on latent thematic ge...
The ability of fast similarity search at large scale is of great importance to many Information Retrieval (IR) applications. A promising way to accelerate similarity search is sem...
Since the advent of XML, the ability to transform documents using transformation languages such as XSLT has become an important challenge. However, writing a transformation script...
The structure of a document has an important influence on the perception of its content. Considering scientific publications, we can affirm that by making use of the ordinary line...
Tudor Groza, Alexander Schutz, Siegfried Handschuh
We present in this paper a method for document layout analysis based on identifying the function of document elements (what they do). This approach is orthogonal and complementary...
Genre, like layout, is an important factor in effective communication, and automated tools which assist in genre compliance are thus of considerable value. Genres are reusable met...
Marc Nanard, Jocelyne Nanard, Peter R. King, Ludov...
Optimisation of real world Variable Data printing (VDP) documents is a difficult problem because the interdependencies between layout functions may drastically reduce the number o...
Alexander J. Macdonald, David F. Brailsford, Steve...