XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML...
Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini...
The inclusion of document length factors has been a major topic in the development of retrieval models. We believe that current models can be further improved by more refined est...
A hierarchical algorithm is presented for determining the similarity and equivalence of document images. Features extracted from the CCIIT fax-compressed representations of two im...
Abstract We present a fast compression and decompression scheme for natural language texts that allows e cient and exible string matching by searching the compressed text directly....
Edleno Silva de Moura, Gonzalo Navarro, Nivio Zivi...
In this paper we present a new segmentation method for the Multidimensional Multiscale Parser (MMP) algorithm. In previous works we have shown that, for text and compound images, ...
Eduardo A. B. da Silva, Manuel J. C. S. Reis, Muri...