Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic

15 years 9 months ago

Download www.ldc.upenn.edu

We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity (NE) types person, organization, location, date and number, as well as quotations (direct reported speech) by and about people. The Named Entity Recognition (NER) system was not developed for Arabic, but - instead - a highly multilingual, almost language-independent NER system was adapted to also cover Arabic. The Semitic language Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This paper thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the otherwise language-independent rule set inorder to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.

Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim,

Real-time Traffic

Arabic | Arabic Information Extraction | Education | LREC 2010 | Named Entity Recognition |

claim paper

» Recognition and translation ArabicFrench of Named Entities case of the Sport places

» Mining Wiki Resources for Multilingual Named Entity Recognition

» NLGbAse A Free Linguistic Resource for Natural Language Processing Systems

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Wajdi Zaghouani, Bruno Pouliquen, Mohamed Ebrahim, Ralf Steinberger

Comments (0)

Sciweavers

Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic

Arabic | Arabic Information Extraction | Education | LREC 2010 | Named Entity Recognition |

Explore & Download

Productivity Tools

Sciweavers