Sciweavers

DAS
2006
Springer
14 years 2 months ago
XCDF: A Canonical and Structured Document Format
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods...
Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadja...
DOCENG
2003
ACM
14 years 5 months ago
Creating reusable well-structured PDF as a sequence of component object graphic (COG) elements
Portable Document Format (PDF) is a page-oriented, graphically rich format based on PostScript semantics and it is also the format interpreted by the Adobe Acrobat viewers. Althou...
Steven R. Bagley, David F. Brailsford, Matthew R. ...
ICDAR
2009
IEEE
14 years 7 months ago
PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents
This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents. The heuristics starts from an initial set of basic content elements an...
Ermelinda Oro, Massimo Ruffolo
ICDAR
2009
IEEE
14 years 7 months ago
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques
There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
Tamir Hassan