Annotating an Arabic Learner Corpus for Error

15 years 8 months ago

Download www.lrec-conf.org

This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a French to an Arabic tagset would give us a measure of the distance between the two languages with respect to learner difficulty. The current collection of texts, which is constantly growing, contains intermediate and advanced-level student writings. We describe the need for such corpora, the learner data we have collected and the tagset we have developed. We also describe the error frequency distribution of both proficiency levels and the ongoing work.

Ghazi Abuhakema, Reem Faraj, Anna Feldman, Eileen

Real-time Traffic

Computer-aided Error Analysis | Education | Interlanguage Database FRIDA | Learner | LREC 2008 |

claim paper

» EAGLE an ErrorAnnotated Corpus of Beginning Learner German

» Towards a Motivated Annotation Schema of Collocation Errors in Learner Corpora

» FineGrain Morphological Analyzer and PartofSpeech Tagger for Arabic Text

» Active Annotation in the LUNA Italian Corpus of Spontaneous Dialogues

» An Automatic Close Copy Speech Synthesis Tool for LargeScale Speech Corpus Evaluation

» Error Correction for Arabic Dictionary Lookup

» Using the Web for Language Independent Spellchecking and Autocorrection

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Ghazi Abuhakema, Reem Faraj, Anna Feldman, Eileen Fitzpatrick

Comments (0)

Sciweavers

Annotating an Arabic Learner Corpus for Error

Computer-aided Error Analysis | Education | Interlanguage Database FRIDA | Learner | LREC 2008 |

Explore & Download

Productivity Tools

Sciweavers