This paper presents a dynamic approach to document page segmentation based on inter-component relationships and their local features. State-of-the art page segmentation algorithms segment zones based on the separation properties of connected components such as distance and orientation, and do not typically consider properties other than size. Our approach uses a combination of component separation and local features. The page is first over-segmented using fuzzy-Voronoi++, a dynamically adaptive scheme of separation features based on [2]. The separation features and zone's content are then fed to a semi-supervised clustering algorithm to fuse the components based on their location and local features. Zone-based evaluation was performed on sets of printed and handwritten documents in English script with multiple font sizes and we achieved an increase of 14% over the accuracy reported in [2]. 1
Mudit Agrawal, David S. Doermann