In this paper, we propose a word shape recognition method for retrieving image-based documents. Document images are segmented at the word level first. Then the proposed method det...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retri...
Abstract Current, data-driven applications have become more dynamic in nature, with the need to respond to events generated from distributed sources or to react to information extr...
Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document,...
Gabriel Oberreuter, Gaston L'Huillier, Sebasti&aac...
This paper presents a transaction-time HTTP server, called ? Apache that supports document versioning. A document often consists of a main file formatted in HTML or XML and severa...