Stop word detection is attempted in this work in the context of retrieval of document images in the compressed domain. Algorithms are presented to identify text lines and words an...
This paper presents a corpus-based algorithm capable of inducing inflectional morphological analyses of both regular and highly irregular forms (such as broughtbring) from distrib...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy and the performance of NLP applications. We are constructing a broad-coverage ...
Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for...
We describe the compilation of a large corpus of French-Dutch sentence pairs from official Belgian documents which are available in the online version of the publication Belgisch ...