Sciweavers

ICDAR
2003
IEEE

A Case Restoration Approach to Named Entity Tagging in Degraded Documents

14 years 5 months ago
A Case Restoration Approach to Named Entity Tagging in Degraded Documents
This paper describes a novel approach to named entity (NE) tagging on degraded documents. NE tagging is the process of identifying salient text strings in unstructured text, corresponding to names of people, places, organizations, times/dates, etc. Although NE tagging is typically part of a larger information extraction process, it has other applications, such as improving search in an information retrieval system, and post-processing the results of an OCR system. We focus on degraded documents, i.e. case insensitive documents that lack orthographic information. Examples include output of speech recognition systems, as well as e-mail. The traditional approach involves retraining an NE tagger on degraded text, a cumbersome operation. This paper describes an approach whereby text is first “restored” to its implicit case sensitive form, and subsequently processed by the original NE tagger. Results show that this new approach leads to far less precision loss in NE tagging of degraded ...
Rohini K. Srihari, Cheng Niu, Wei Li, Jihong Ding
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where ICDAR
Authors Rohini K. Srihari, Cheng Niu, Wei Li, Jihong Ding
Comments (0)