Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents

15 years 8 months ago

Download clgiles.ist.psu.edu

Two dimensional plots (2-D) in digital documents on the web are an important source of information that is largely under-utilized. In this paper, we outline how data and text can be extracted automatically from these 2-D plots, thus eliminating a time consuming manual process. Our information extraction algorithm identifies the axes of the figures, extracts text blocks like axes-labels and legends and identifies data points in the figure. It also extracts the units appearing in the axes labels and segments the legends to identify the different lines in the legend, the different symbols and their associated text explanations. Our algorithm also performs the challenging task of separating out overlapping text and data points effectively. Our experiments indicate that these techniques are computationally efficient and provide acceptable accuracy.

Saurabh Kataria, William Browuer, Prasenjit Mitra,

Real-time Traffic

2-D Plots | AAAI 2008 | Data Points | Identifies Data Points | Intelligent Agents |

claim paper

Post Info
More Details (n/a)

Added	02 Oct 2010
Updated	02 Oct 2010
Type	Conference
Year	2008
Where	AAAI
Authors	Saurabh Kataria, William Browuer, Prasenjit Mitra, C. Lee Giles

Comments (0)

Sciweavers

Automatic Extraction of Data Points and Text Blocks from 2-Dimensional Plots in Digital Documents

2-D Plots | AAAI 2008 | Data Points | Identifies Data Points | Intelligent Agents |

Explore & Download

Productivity Tools

Sciweavers