Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

179

ICDAR
1997
IEEE

143views Document Analysis» more ICDAR 1997»

Representing OCRed documents in HTML

15 years 10 months ago

Representing OCRed documents in HTML

Download www.cedar.buffalo.edu

ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in reading and understanding if they do not refer to the original image representation. As demonstrated in this paper, a hybrid document which combines symbolic representation and image representation may relieve the problem. If we represent a OCRed document properly in HTML, OCR errors will not have much negative eect on the human reading process in a HTML browser and can be corrected by using a HTML authoring tool. Under the approach, an experiment evaluating a Japanese OCR system developed in CEDAR is also reported in this paper. 1 Overview of the Approach OCR is a process to transform a given document from its image representation into its symbolic representation. After this process, we obtain a text document which is electronically searchable, indexable and reusable. However, the transformation is error-prone...

Tao Hong, Sargur N. Srihari

Real-time Traffic

Document Analysis | ICDAR 1997 | Image Representation | OCR Errors | Ocred Text |

claim paper

Related Content

» Using visual cues for extraction of tabular data from arbitrary HTML documents

» Efficient automatic OCR word validation using word partial format derivation and language ...

» A Corpus for Comparative Evaluation of OCR Software and Postcorrection Techniques

» An OCR Free Method for Word Spotting in Printed Documents the Evaluation of Different Feat...

» Generation documentation and presentation of mathematical equations and symbolic scientifi...

» Significance of HTML Tags for Document Indexing and Retrieval

» iCube A ToolSet for the Dynamic Extraction and Integration of Web Data Content

» ContextSensitive Error Correction Using Topic Models to Improve OCR

» MergeLayouts Overcoming Faulty Segmentations by a Comprehensive Voting of Commercial OCR D...

Post Info
More Details (n/a)

Added	06 Aug 2010
Updated	06 Aug 2010
Type	Conference
Year	1997
Where	ICDAR
Authors	Tao Hong, Sargur N. Srihari

Comments (0)