In recent years statistical word alignment models have been widely used for various Natural Language Processing (NLP) problems. In this paper we describe a platform independent and object oriented implementation (in Java) of a word alignment algorithm. This algorithm is based on the first three IBM models. This is an ongoing work in which we are trying to explore the possible enhancements to the IBM models, especially for related languages like the Indian languages. We have been able to improve the performance by introducing a similarity measure (Dice coefficient), using a list of cognates and morph analyzer. Use of information about cognates is especially relevant for Indian languages because these languages have a lot of borrowed and inherited words which are common to more than one language. For our experiments on English-Hindi word alignment, we also tried to use a bilingual dictionary to bootstrap the Expectation Maximization (EM) algorithm. After training on 7399 sentence aligned...
G. Chinnappa, Anil Kumar Singh