We describe a system for the retrieval on the basis of layout similarity of document images belonging to collections stored in digital libraries. Layout regions are extracted and ...
Abstract. A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique f...
A crucial preprocessing stage in applications such as OCR is text extraction from mixed-type documents. The present work, in contrast to most until now, successfully faces the pro...
−Document clustering has become an increasingly important task in analyzing huge numbers of documents distributed among various sites. The challenging aspect is to analyze this e...
Self-Organizing Maps capable of encoding structured information will be used for the clustering of XML documents. Documents formatted in XML are appropriately represented as graph ...
Markus Hagenbuchner, Alessandro Sperduti, Ah Chung...