Two-dimensional and three-dimensional coordinate systems are the basic graphics symbols in many graphical documents. A robust coordinate system detection scheme is needed in order...
In machine translation, document alignment refers to finding correspondences between documents which are exact translations of each other. We define pseudo-alignment as the task...
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...
Abstract. The purpose of information extraction (IE) is to find desired pieces of information in natural language texts and store them in a form that is suitable for automatic pro...
This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common ...