Orthographic Case Restoration Using Supervised Learning Without Manual Annotation

15 years 8 months ago

Download www.aaai.org

One challenge in text processing is the treatment of case insensitive documents such as speech recognition results. The traditional approach is to re-train a language model excluding case-related features. This paper presents an alternative two-step approach whereby a preprocessing module (Step 1) is designed to restore case-sensitive form to feed the core system (Step 2). Step 1 is implemented as a Hidden Markov Model trained on a large raw corpus of case sensitive documents. It is demonstrated that this approach (i) outperforms the feature exclusion approach for Named Entity tagging, (ii) leads to limited degradation for semantic parsing and relationship extraction, (iii) reduces system complexity, and (iv) has wide applicability: the restored text can feed both statistical model and rule-based systems.

Cheng Niu, Wei Li 0003, Jihong Ding, Rohini K. Sri

Real-time Traffic

Alternative Two-step Approach | Artificial Intelligence | Case Insensitive Documents | Feature Exclusion Approach | FLAIRS 2003 |

claim paper

» Corpus Based Unsupervised Labeling of Documents

» Multiple feature fusion by subspace learning

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	FLAIRS
Authors	Cheng Niu, Wei Li 0003, Jihong Ding, Rohini K. Srihari

Comments (0)

Sciweavers

Orthographic Case Restoration Using Supervised Learning Without Manual Annotation

Alternative Two-step Approach | Artificial Intelligence | Case Insensitive Documents | Feature Exclusion Approach | FLAIRS 2003 |

Explore & Download

Productivity Tools

Sciweavers