Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking

15 years 1 months ago

Download web.mit.edu

We investigate the tasks of general morphological tagging, diacritization, and lemmatization for Arabic. We show that for all tasks we consider, both modeling the lexeme explicitly, and retuning the weights of individual classifiers for the specific task, improve the performance. 1 Previous Work Arabic has about 14 dimensions of inflection (most of them orthogonal), and in our training corpus of about 288,000 words we find 3279 complete morphological tags, with up to 100,000 possible tags. Because of the large number of tags, it is clear that morphological tagging cannot be construed as a simple classification task. Hajic (2000) is the first to use a dictionary as a source of possible morphological analyses (and hence tags) for an inflected word form, and then redefined the tagging task as a choice among the tags proposed by the dictionary, using a log-linear model trained on specific ambiguity classes for individual morphological features. Hajic et al. (2005) implement the approach o...

Ryan Roth, Owen Rambow, Nizar Habash, Mona T. Diab

Real-time Traffic

ACL 2008 | Complete Morphological Tags | Computational Linguistics | Individual Morphological Features | Possible Morphological Analyses |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	ACL
Authors	Ryan Roth, Owen Rambow, Nizar Habash, Mona T. Diab, Cynthia Rudin

Comments (0)

Sciweavers

Arabic Morphological Tagging, Diacritization, and Lemmatization Using Lexeme Models and Feature Ranking

ACL 2008 | Complete Morphological Tags | Computational Linguistics | Individual Morphological Features | Possible Morphological Analyses |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers