The Arabic Treebank (ATB) Project at the Linguistic Data Consortium (LDC) has embarked on a large corpus of Broadcast News (BN) transcriptions, and this has led to a number of new...
Mohamed Maamouri, Ann Bies, Seth Kulick, Wajdi Zag...
This paper presents an unsupervised learning approach to building a non-English (Arabic) stemmer. The stemming model is based on statistical machine translation and it uses an Eng...
For languages with rich content over the web, business reviews are easily accessible via many known websites, e.g., Yelp.com. For languages with poor content over the web like Arab...
This work evaluates a few search strategies for Arabic monolingual and cross-lingual retrieval, using the TREC Arabic corpus as the test-bed. The release by NIST in 2001 of an Ara...
Searching online information is increasingly a daily activity for many people. The multilinguality of online content is also increasing (e.g. the proportion of English web users, ...
Yaser Al-Onaizan, Radu Florian, Martin Franz, Hany...