The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and impro...
We present in this article the methods we used for obtaining measures to ensure the quality and well-formedness of a text corpus. These measures allow us to determine the compatib...
String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling co...
While empirical evaluations are a common research method in some areas of Artificial Intelligence (AI), others still neglect this approach. This article outlines both the opportun...
The production of rich multilingual speech corpus resources on a large scale is a requirement for many linguistic, phonetic and technological tasks, in both research and applicati...