Sciweavers

SIGIR
2000
ACM

Document centered approach to text normalization

14 years 3 months ago
Document centered approach to text normalization
In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected, and identification of abbreviations. The main/eature of our approach is that it uses a minimum of pre-built resources, instead dynamically in/erring disambiguation clues from the entire document itself. This makes it domain independent, closely targeted to each individual document and portable to other languages. We thoroughly evaluated this approach on several corpora and it showed high accuracy.
Andrei Mikheev
Added 01 Aug 2010
Updated 01 Aug 2010
Type Conference
Year 2000
Where SIGIR
Authors Andrei Mikheev
Comments (0)