

Capturing Out-of-Vocabulary Words in Arabic Text

14 years 1 months ago
Capturing Out-of-Vocabulary Words in Arabic Text
The increasing flow of information between languages has led to a rise in the frequency of non-native or loan words, where terms of one language appear transliterated in another. Dealing with such out of vocabulary words is essential for successful cross-lingual information retrieval. For example, techniques such as stemming should not be applied indiscriminately to all words in a collection, and so before any stemming, foreign words need to be identified. In this paper, we investigate three approaches for the identification of foreign words in Arabic text: lexicons, language patterns, and n-grams and present that results show that lexicon-based approaches outperform the other techniques.
Abdusalam F. A. Nwesri, Seyed M. M. Tahaghoghi, Fa
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Authors Abdusalam F. A. Nwesri, Seyed M. M. Tahaghoghi, Falk Scholer
Comments (0)