XCDF: A Canonical and Structured Document Format

15 years 9 months ago

Download www.bloechle.ch

Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and consistently benefiting from our canonical format in order to access PDF document content and structures.

Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadja

Real-time Traffic

Canonical Format | DAS 2006 | Document Analysis | Pdf Document | Reverse Engineering |

claim paper

» Towards a Canonical and Structured Representation of PDF Documents through Reverse Enginee...

Post Info
More Details (n/a)

Added	13 Oct 2010
Updated	13 Oct 2010
Type	Conference
Year	2006
Where	DAS
Authors	Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadjar, Denis Lalanne, Rolf Ingold

Comments (0)

Sciweavers

XCDF: A Canonical and Structured Document Format

Canonical Format | DAS 2006 | Document Analysis | Pdf Document | Reverse Engineering |

Explore & Download

Productivity Tools

Sciweavers