Extraction, layout analysis and classification of diagrams in PDF documents

15 years 12 months ago

Download www.ccs.neu.edu

Diagrams are a critical part of virtually all scientific and technical documents. Analyzing diagrams will be important for building comprehensive document retrieval systems. This paper focuses on the extraction and classification of diagrams from PDF documents. We study diagrams available in vector (not raster) format in online research papers. PDF files are parsed and their vector graphics components installed in a spatial index. Subdiagrams are found by analyzing white space gaps. A set of statistics is generated for each diagram, e.g., the number of horizontal lines and vertical lines. The statistics form a feature vector description of the diagram. The vectors are used in a kernel-based machine learning system (Support Vector Machine). Separating a set of bar graphs from non-bar-graphs gathered from 20,000 biology research papers gave a classification accuracy of

Robert P. Futrelle, Mingyan Shao, Chris Cieslik, A

Real-time Traffic

Diagrams | Document Analysis | ICDAR 2003 | Research Papers | Vector Graphics Components |

claim paper

» Metadata Extraction from PDF Papers for Digital Library Ingest

» Xed A New Tool for eXtracting Hidden Structures from Electronic Documents

» Creating reusable wellstructured PDF as a sequence of component object graphic COG element...

» Document page similarity based on layout visual saliency Application to query by example a...

» Intelligent Document Processing

» TextGraphic labelling of Ancient Printed Documents

» A Preprocessing Method for NaXi Pictograph Character Recognition

» Generic scalespace process for handwriting documents analysis

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDAR
Authors	Robert P. Futrelle, Mingyan Shao, Chris Cieslik, Andrea Elaina Grimes

Comments (0)

Sciweavers

Extraction, layout analysis and classification of diagrams in PDF documents

Diagrams | Document Analysis | ICDAR 2003 | Research Papers | Vector Graphics Components |

Explore & Download

Productivity Tools

Sciweavers