Project MEMORIAL [3] is aimed at developing a new technology for creating Web based information systems using interactive electronic documents extracted from their paper originals...
Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector ...
Abstract. This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create a concept-based document representation b...
Anna Huang, David N. Milne, Eibe Frank, Ian H. Wit...
In this paper, a new efficient word spotting methodology is presented that can be applied to historical printed documents without requiring any previous block or word segmentation...
Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimat...
Angelo Di Iorio, Michele Schirinzi, Fabio Vitali, ...