Sciweavers

ACL
2003

tRuEcasIng

14 years 27 days ago
tRuEcasIng
Truecasing is the process of restoring case information to badly-cased or noncased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ∼98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.
Lucian Vlad Lita, Abraham Ittycheriah, Salim Rouko
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where ACL
Authors Lucian Vlad Lita, Abraham Ittycheriah, Salim Roukos, Nanda Kambhatla
Comments (0)