Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

160

SIGIR
2000
ACM

125views Information Technology» more SIGIR 2000»

Document centered approach to text normalization

15 years 11 months ago

Document centered approach to text normalization

Download parnec.nuaa.edu.cn

In this paper we present an approach to tackle three important problems of text normalization: sentence boundary disambiguation, disambiguation of capitalized words when they are used in positions where capitalization is expected, and identification of abbreviations. The main/eature of our approach is that it uses a minimum of pre-built resources, instead dynamically in/erring disambiguation clues from the entire document itself. This makes it domain independent, closely targeted to each individual document and portable to other languages. We thoroughly evaluated this approach on several corpora and it showed high accuracy.

Andrei Mikheev

Real-time Traffic

Capitalized Words | Disambiguation Clues | Information Management | Sentence Boundary Disambiguation | SIGIR 2000 |

claim paper

Related Content

» A New TextLine Alignment Approach Based on PieceWise Painting Algorithm for Handwritten Do...

» TextGraphic labelling of Ancient Printed Documents

» Hierarchical Concept Description and Learning for Information Extraction

» A partition approach for the restoration of camera images of planar and curled document

» Boosted decision trees for word recognition in handwritten document retrieval

» Moara a Java library for extracting and normalizing gene and protein mentions

» Robust Text Detection from Binarized Document Images

» Localization Extraction and Recognition of Text in Telugu Document Images

» Skew Estimation for Scanned Documents from Noises

Post Info
More Details (n/a)

Added	01 Aug 2010
Updated	01 Aug 2010
Type	Conference
Year	2000
Where	SIGIR
Authors	Andrei Mikheev

Comments (0)