Abstract. This paper describes the technique for translation of character n-grams we developed for our participation in CLEF 2006. This solution avoids the need for word normalization during indexing or translation, and it can also deal with out-of-vocabulary words. Since it does not rely on language-specific processing, it can be applied to very different languages, even when linguistic information and resources are scarce or unavailable. Our proposal makes considerable use of freely available resources and also tries to achieve a higher speed during the n-gram alignment process with respect to other similar approaches. Key words: Cross-Language Information Retrieval, character n-grams, translation algorithms, alignment algorithms, association measures.
Jesús Vilares, Michael P. Oakes, John Tait