—Content-based document image retrieval is a new and promising research area. Without OCR, document indexing directly based on image content is more general and convenient. Howev...
Abstract. We propose an approach for efficient word retrieval from printed documents belonging to Digital Libraries. The approach combines word image clustering (based on Self Orga...
Documents in many corpora, such as digital libraries and webpages, contain both content and link information. To explicitly consider the document relations represented by links, i...
This work presents a methodology for grouping structurally similar XML documents using clustering algorithms. Modeling XML documents with tree-like structures, we face the ‘clust...
Theodore Dalamagas, Tao Cheng, Klaas-Jan Winkel, T...
We consider the problem of organizing and browsing the top ranked portion of the documents returned by an information retrieval system. We study the effectiveness of a document o...