Numerous approaches, including textual, structural and featural, to detecting duplicate documents have been investigated. Considering document images are usually stored and transm...
Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/agei...
Mapping documents into an interlingual representation can help bridge the language barrier of a cross-lingual corpus. Previous approaches use aligned documents as training data to...
Abstract. In this paper, we address the problem of opinion analysis using a probabilistic approach to the underlying structure of different types of opinions or sentiments around ...
As XML has emerged as a data representation format and as great quantities of data have been stored in the XML format, XML document design has become an important and evident issu...