Since the advent of XML, the ability to transform documents using transformation languages such as XSLT has become an important challenge. However, writing a transformation script...
The extraction of textual content from colour documents of a graphical nature is a complicated task. The text can be rendered in any colour, size and orientation while the existen...
In this paper we will present a set of experiments using large digitalized collections of books to show that logical structures can be extracted with good quality when working at ...
This paper presents a new representation and evaluation procedure of page segmentation algorithms and analyzes six widely-used layout analysis algorithms using the procedure. The ...
Unveiled in late 2004, Google Book Search is an ambitious program to make all the world's books discoverable online. The sheer scale of the problem brings a number of unique ...