ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information

15 years 8 months ago

Download users.dsic.upv.es

In this paper we describe an improved version of ANERsys, an Arabic Named Entity Recognition system for open-domain texts. The ﬁrst version of ANERsys was totally based on the Maximum Entropy approach and was trained and tested with corpora which we have built ourselves. The results showed that the Maximum Entropy is an appropriate method to identify Named Entities in Arabic texts. However, in order to reach higher performance a greater eﬀort needed to be done to improve the recognition of long proper names. Therefore, in the second version of ANERsys, we use a Part Of Speech tagger and a two-steps approach to enhance the performance of the system. Furthermore, we have used our own (now freely available on our website) corpora (ANERcorp) and gazetteers (ANERgazet) to train and evaluate ANERsys 2.0. We carried out several experiments to evaluate the performance of the system and to compare it with the online freely available demo version of the commercial system Siraj (Sakhr). The r...

Yassine Benajiba, Paolo Rosso

Real-time Traffic

Arabic Named Entity | Artificial Intelligence | IICAI 2007 | Maximum Entropy | Maximum Entropy Approach |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	IICAI
Authors	Yassine Benajiba, Paolo Rosso

Sciweavers

ANERsys 2.0: Conquering the NER Task for the Arabic Language by Combining the Maximum Entropy with POS-tag Information

Arabic Named Entity | Artificial Intelligence | IICAI 2007 | Maximum Entropy | Maximum Entropy Approach |

Explore & Download

Productivity Tools

Sciweavers