Sciweavers

ICDAR
2007
IEEE

Simultaneous Layout Style and Logical Entity Recognition in a Heterogeneous Collection of Documents

14 years 6 months ago
Simultaneous Layout Style and Logical Entity Recognition in a Heterogeneous Collection of Documents
Logical entity recognition in heterogeneous collections of document page images remains a challenging problem since the performance of traditional supervised methods degrade dramatically in case of many distinct layout styles. In this paper we present an unsupervised method where layout style information is explicitly used in both training and recognition phases. We represent the layout style, local features, and logical labels of physical regions of a document compactly by an ordered labeled X-Y tree. Style dissimilarity of two document pages is represented by the distance of their representing trees. During the training phase, document pages with true logical labels in training set are classified into distinct layout styles by unsupervised clustering. During the recognition phase, the layout style and logical entities of an input document are recognized simultaneous by matching the input tree to the trees in closest-matched layout style cluster, of training set. The experimental re...
S. Chen, S. Mao, G. Thoma
Added 03 Jun 2010
Updated 03 Jun 2010
Type Conference
Year 2007
Where ICDAR
Authors S. Chen, S. Mao, G. Thoma
Comments (0)