OCD: An Optimized and Canonical Document Format

15 years 4 months ago

Download www.cvc.uab.es

Revealing and being able to manipulate the structured content of PDF documents is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we present OCD, an optimized, easy-to-process and canonical format for representing structured electronic documents. The system and methods used for reverse engineering PDF documents into the OCD format are presented as well as the techniques to optimize it. We finally expose concrete evaluations of our OCD format compactness and restructuring performances.

Jean-Luc Bloechle, Denis Lalanne, Rolf Ingold

Real-time Traffic

Document Analysis | ICDAR 2009 | OCD Format | PDF Documents | Reverse Engineering |

claim paper

Post Info
More Details (n/a)

Added	18 Feb 2011
Updated	18 Feb 2011
Type	Journal
Year	2009
Where	ICDAR
Authors	Jean-Luc Bloechle, Denis Lalanne, Rolf Ingold

Comments (0)

Sciweavers

OCD: An Optimized and Canonical Document Format

Document Analysis | ICDAR 2009 | OCD Format | PDF Documents | Reverse Engineering |

Explore & Download

Productivity Tools

Sciweavers