Sciweavers

CIKM
2003
Springer

Statistical transliteration for english-arabic cross language information retrieval

14 years 5 months ago
Statistical transliteration for english-arabic cross language information retrieval
Out of vocabulary (OOV) words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to transliterate the unknown words, that is, to render them in the orthography of the second language. In the present study, we present a simple statistical technique to train an English to Arabic transliteration model from pairs of names. We call this a selected n-gram model because a two-stage training procedure first learns which n-gram segments should be added to the unigram inventory for the source language, and then a second stage learns the translation model over this inventory. This technique requires no heuristics or linguistic knowledge of either language. We evaluate the statistically-trained model and a simpler hand-crafted model on a test set of named entities from the Arabic AFP corpus and demonstrate that they perform better than two online translation sources. We also explore the effectiveness of thes...
Nasreen Abdul Jaleel, Leah S. Larkey
Added 06 Jul 2010
Updated 06 Jul 2010
Type Conference
Year 2003
Where CIKM
Authors Nasreen Abdul Jaleel, Leah S. Larkey
Comments (0)