A Case Restoration Approach to Named Entity Tagging in Degraded Documents

14 years 6 months ago

Download www.cse.salford.ac.uk

This paper describes a novel approach to named entity (NE) tagging on degraded documents. NE tagging is the process of identifying salient text strings in unstructured text, corresponding to names of people, places, organizations, times/dates, etc. Although NE tagging is typically part of a larger information extraction process, it has other applications, such as improving search in an information retrieval system, and post-processing the results of an OCR system. We focus on degraded documents, i.e. case insensitive documents that lack orthographic information. Examples include output of speech recognition systems, as well as e-mail. The traditional approach involves retraining an NE tagger on degraded text, a cumbersome operation. This paper describes an approach whereby text is first “restored” to its implicit case sensitive form, and subsequently processed by the original NE tagger. Results show that this new approach leads to far less precision loss in NE tagging of degraded ...

Rohini K. Srihari, Cheng Niu, Wei Li, Jihong Ding

Real-time Traffic

Document Analysis | ICDAR 2003 | NE Tagger | Original Ne Tagger | Salient Text Strings |

claim paper

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDAR
Authors	Rohini K. Srihari, Cheng Niu, Wei Li, Jihong Ding

Comments (0)

Sciweavers

A Case Restoration Approach to Named Entity Tagging in Degraded Documents

Document Analysis | ICDAR 2003 | NE Tagger | Original Ne Tagger | Salient Text Strings |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers