A robust front page detection algorithm for large periodical collections

16 years 1 months ago

Download figment.cse.usf.edu

Large-scale digitization projects aimed at periodicals often have as input streams of completely unlabeled document images. In such situations, the results produced by the automatic segmentation of the document stream into issues heavily inﬂuence the overall output quality of a document image analysis system. As a solution to the issue segmentation problem, this paper introduces a robust, two-step front page detection algorithm. First, the salient connected components from the front page of the periodical are described using a multi-dimensional Gaussian distribution based on discrete cosine transform (DCT) features. Second, a graph model is computed by applying Delaunay triangulation on the selected set of components. A specialized, errortolerant graph matching algorithm is used to compute the distance score between the model and each candidate page. Experiments on a large, real-world newspaper data set demonstrate the generality and effectiveness of the proposed method.

Iuliu Vasile Konya, Christoph Seibert, Sebastian G

Real-time Traffic

Computer Vision | Document Image | ICPR 2008 | Issue Segmentation Problem | Unlabeled Document Images |

claim paper

» Document Image Enhancement Using Directional Wavelet

» Supporting analysis of futurerelated information in news archives and the web

Post Info
More Details (n/a)

Added	30 May 2010
Updated	30 May 2010
Type	Conference
Year	2008
Where	ICPR
Authors	Iuliu Vasile Konya, Christoph Seibert, Sebastian Glahn, Stefan Eickeler

Comments (0)

Sciweavers

A robust front page detection algorithm for large periodical collections

Computer Vision | Document Image | ICPR 2008 | Issue Segmentation Problem | Unlabeled Document Images |

Explore & Download

Productivity Tools

Sciweavers