Sciweavers

DAS
2006
Springer

A System for Converting PDF Documents into Structured XML Format

14 years 4 months ago
A System for Converting PDF Documents into Structured XML Format
We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files (text, bitmap and vectorial images) and then applies different components in order to express in XML the logically structured documents. Some of these components are traditional in Document Analysis, other more specific to PDF. We also present a graphical user interface in order to check, correct and validate the analysis of the components. We eventually report on two real user cases where this system was applied on.
Hervé Déjean, Jean-Luc Meunier
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where DAS
Authors Hervé Déjean, Jean-Luc Meunier
Comments (0)