Numerous approaches, including textual, structural and featural, to detecting duplicate documents have been investigated. Considering document images are usually stored and transm...
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods...