Sciweavers

ICDAR
2003
IEEE

Document Transformation System from Papers to XML Data Based on Pivot XML Document Method

14 years 5 months ago
Document Transformation System from Papers to XML Data Based on Pivot XML Document Method
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and figures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or specific programs. Experimental results show the method is effective in transforming printed documents to various XML documents.
Yasuto Ishitani
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where ICDAR
Authors Yasuto Ishitani
Comments (0)