In this paper, we will present an efficient method to compute the co-occurrence counts of any pair of substring in a parallel corpus, and an algorithm that make use of these count...
This paper describes LINGUA - an architecture for text processing in Bulgarian. First, the pre-processing modules for tokenisation, sentence splitting, paragraph segmentation, par...
Compounding is a very productive process in German to form complex nouns and adjectives which represent about 7% of the words of a newspaper text. Unlike English, German compounds ...
The recently proposed method for image compression based on multi-scale recurrent patterns, the MMP (Multidimensional Multiscale Parser) has been shown to perform well for a large...
Eddie B. L. Filho, Murilo B. de Carvalho, Eduardo ...
Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/agei...