Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

179

ICDAR
2003
IEEE

169views Document Analysis» more ICDAR 2003»

Document Transformation System from Papers to XML Data Based on Pivot XML Document Method

15 years 12 months ago

Document Transformation System from Papers to XML Data Based on Pivot XML Document Method

Download www.cse.salford.ac.uk

This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transformation strategy based on a pivot XML document. Firstly, document elements such as title, authors, abstract, headings, paragraphs, lists, captions, tables and ﬁgures are extracted from document images. Secondly, the hierarchical structure of document elements is extracted and is described using a DOM tree. Thirdly, this document structure is converted into a pivot XML document described as an XHTML document by an XML parser. Finally, this pivot XML document is transformed into the target XML document by the XML parser with XSLT scripts or speciﬁc programs. Experimental results show the method is effective in transforming printed documents to various XML documents.

Yasuto Ishitani

Real-time Traffic

Document Analysis | Documents | ICDAR 2003 | Pivot Xml Document | XML Documents |

claim paper

Related Content

» Using AutoMed for XML data transformation and integration

» On Efficient Partmatch Querying of XML Data

» GNDTD Graphical Notations for Describing XML Documents

» RuleBased Generation of XML Schemas from UML Class Diagrams

» On Efficient and Effective Association Rule Mining from XML Data

» Querying transformed XML documents Determining a sufficient fragment of the original docum...

» Selectively Storing XML Data in Relations

» Validation of XML Documents From UML Models to XML Schemas and XSLT Stylesheets

» XML and Knowledge Technologies for SemanticBased Indexing of Paper Documents

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDAR
Authors	Yasuto Ishitani

Comments (0)