Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

108

Voted

CIKM
2003
Springer

favoriteEmaildiscussreport

154views Information Technology» more CIKM 2003»

Statistical transliteration for english-arabic cross language information retrieval

15 years 5 months ago

Statistical transliteration for english-arabic cross language information retrieval

Download ciir.cs.umass.edu

Out of vocabulary (OOV) words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to transliterate the unknown words, that is, to render them in the orthography of the second language. In the present study, we present a simple statistical technique to train an English to Arabic transliteration model from pairs of names. We call this a selected n-gram model because a two-stage training procedure first learns which n-gram segments should be added to the unigram inventory for the source language, and then a second stage learns the translation model over this inventory. This technique requires no heuristics or linguistic knowledge of either language. We evaluate the statistically-trained model and a simpler hand-crafted model on a test set of named entities from the Arabic AFP corpus and demonstrate that they perform better than two online translation sources. We also explore the effectiveness of thes...

Nasreen Abdul Jaleel, Leah S. Larkey

Real-time Traffic

CIKM 2003 | Cross Language Information Retrieval | Cross Language Ir | Oov Words |

claim paper

Related Content

» Curate a transliteration corpus from transliterationtranslation pairs

» Enhanced Query Expansion in EnglishArabic CLIR

» Combining resources with confidence measures for cross language information retrieval

» They Are Out There If You Know Where to Look Mining Transliterations of OOV Query Terms fo...

» Hindi to English and Marathi to English Cross Language Information Retrieval Evaluation

» Foreign Name Backward Transliteration in ChineseEnglish CrossLanguage Image Retrieval

» CrossLanguage Information Retrieval for Technical Documents

» JapaneseEnglish CrossLanguage Information Retrieval Exploration of Query Translation and T...

» Applying a Dynamic Bayesian Network Framework to Transliteration Identification

Post Info
More Details (n/a)

Added	06 Jul 2010
Updated	06 Jul 2010
Type	Conference
Year	2003
Where	CIKM
Authors	Nasreen Abdul Jaleel, Leah S. Larkey

Comments (0)