A Java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models

14 years 1 months ago

Download ltrc.iiit.ac.in

In recent years statistical word alignment models have been widely used for various Natural Language Processing (NLP) problems. In this paper we describe a platform independent and object oriented implementation (in Java) of a word alignment algorithm. This algorithm is based on the first three IBM models. This is an ongoing work in which we are trying to explore the possible enhancements to the IBM models, especially for related languages like the Indian languages. We have been able to improve the performance by introducing a similarity measure (Dice coefficient), using a list of cognates and morph analyzer. Use of information about cognates is especially relevant for Indian languages because these languages have a lot of borrowed and inherited words which are common to more than one language. For our experiments on English-Hindi word alignment, we also tried to use a bilingual dictionary to bootstrap the Expectation Maximization (EM) algorithm. After training on 7399 sentence aligned...

G. Chinnappa, Anil Kumar Singh

Real-time Traffic

Artificial Intelligence | IICAI 2007 | Indian Languages | Word Alignment | Word Alignment Tool |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	IICAI
Authors	G. Chinnappa, Anil Kumar Singh

Comments (0)

Sciweavers

A Java Implementation of an Extended Word Alignment Algorithm Based on the IBM Models

Artificial Intelligence | IICAI 2007 | Indian Languages | Word Alignment | Word Alignment Tool |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers