Sciweavers

IJCAI
2007

Locating Complex Named Entities in Web Text

14 years 1 months ago
Locating Complex Named Entities in Web Text
Named Entity Recognition (NER) is the task of locating and classifying names in text. In previous work, NER was limited to a small number of predefined entity classes (e.g., people, locations, and organizations). However, NER on the Web is a far more challenging problem. Complex names (e.g., film or book titles) can be very difficult to pick out precisely from text. Further, the Web contains a wide variety of entity classes, which are not known in advance. Thus, hand-tagging examples of each entity class is impractical. This paper investigates a novel approach to the first step in Web NER: locating complex named entities in Web text. Our key observation is that named entities can be viewed as a species of multiword units, which can be detected by accumulating n-gram statistics over the Web corpus. We show that this statistical method’s F1 score is 50% higher than that of supervised techniques including Conditional Random Fields (CRFs) and Conditional Markov Models (CMMs) when ap...
Doug Downey, Matthew Broadhead, Oren Etzioni
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where IJCAI
Authors Doug Downey, Matthew Broadhead, Oren Etzioni
Comments (0)