To support rule-writers, we are developing techniques to automatically analyze large number of public comments on proposed regulations. A document is analyzed in various ways incl...
Today, valuable business information is increasingly stored as unstructured data (documents, emails, etc.). For example, documents exchanged between business partners capture info...
This paper presents a framework for understanding tables of contents (TOC) of books, journals, and magazines. We propose a universal logical structure representation in terms of a...
We aim to characterize the comparability of corpora, we address this issue in the trilingual context through the distinction of expert and non expert documents. We work separately...
This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common ...