Capturing Out-of-Vocabulary Words in Arabic Text

15 years 8 months ago

Download acl.ldc.upenn.edu

The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval. For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, and so before any stemming, foreign words need to be identified. In this paper, we investigate three approaches for the identification of foreign words in Arabic text: lexicons, language patterns, and n-grams and present that results show that lexicon-based approaches outperform the other techniques.

Abdusalam F. A. Nwesri, Seyed M. M. Tahaghoghi, Fa

Real-time Traffic

EMNLP 2006 | EMNLP 2007 | Foreign Words | Loan Words | Vocabulary Words |

claim paper

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	EMNLP
Authors	Abdusalam F. A. Nwesri, Seyed M. M. Tahaghoghi, Falk Scholer

Sciweavers

Capturing Out-of-Vocabulary Words in Arabic Text

EMNLP 2006 | EMNLP 2007 | Foreign Words | Loan Words | Vocabulary Words |

Explore & Download

Productivity Tools

Sciweavers