A Comparison of Clustering Methods for Word Image Indexing

15 years 8 months ago

Download www.dsi.unifi.it

In this paper we explore the effectiveness of three clustering methods used to perform word image indexing. The three methods are: the Self-Organazing Map (SOM), the Growing Hierarchical Self-Organazing Map (GHSOM), and the Spectral Clustering. We test these methods on a real data set composed of word images extrapolated from pages that are part of an encyclopedia of the XIXth Century. In essence, the word images are stored into the clusters defined by the clustering methods and subsequently retrieved by identifying the closest cluster to a query word. The accuracy of the methods is compared considering the performance of our word retrieval algorithm developed in our previous work. From the experimental results we may conclude that methods designed to automatically determine the number and the structure of clusters, such as GHSOM, are particularly suitable in the context represented by our data set.

Simone Marinai, Emanuele Marino, Giovanni Soda

Real-time Traffic