Sciweavers

ICDAR
2009
IEEE

Analysis of Book Documents' Table of Content Based on Clustering

13 years 9 months ago
Analysis of Book Documents' Table of Content Based on Clustering
Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we have observed that book documents are multi-page documents with intrinsic local format consistency. Based on this finding we introduce an automatic TOC analysis method through clustering. This method first detects the decorative elements in TOC pages. Then it learns a layout model used in the TOC pages through clustering. Finally, it generates TOC entries and extracts their hierarchical structure under the guidance of the model. More specifically, broken lines are taken into account in the method. Experimental results show that this method achieves high accuracy and efficiency. In addition, this method has been successfully applied in a commercial E-book production software package.
Liangcai Gao, Zhi Tang, Xiaofan Lin, Xin Tao, Yimi
Added 18 Feb 2011
Updated 18 Feb 2011
Type Journal
Year 2009
Where ICDAR
Authors Liangcai Gao, Zhi Tang, Xiaofan Lin, Xin Tao, Yimin Chu
Comments (0)