tRuEcasIng

15 years 8 months ago

Download acl.ldc.upenn.edu

Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ∼98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.

Lucian Vlad Lita, Abraham Ittycheriah, Salim Rouko

Real-time Traffic

ACL 2003 | ACL 2007 | BLEU Score Improvement | Named Entity Recognition | Speech Recognition Text |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	ACL
Authors	Lucian Vlad Lita, Abraham Ittycheriah, Salim Roukos, Nanda Kambhatla

Comments (0)

Sciweavers

tRuEcasIng

ACL 2003 | ACL 2007 | BLEU Score Improvement | Named Entity Recognition | Speech Recognition Text |

Explore & Download

Productivity Tools

Sciweavers