Sciweavers

PAMI
2002

Imaged Document Text Retrieval Without OCR

13 years 11 months ago
Imaged Document Text Retrieval Without OCR
: We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the Vertical Traverse Density (VTD) and Horizontal Traverse Density (HTD), are extracted. An n-gram based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from UW1 database confirms the validity of the proposed method.
Chew Lim Tan, Weihua Huang, Zhaohui Yu, Yi Xu
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where PAMI
Authors Chew Lim Tan, Weihua Huang, Zhaohui Yu, Yi Xu
Comments (0)