This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original document layout structure. Xed mixes electronic extraction methods with state-...
Most text analysis is designed to deal with the concept of a “document”, namely a cohesive presentation of thought on a unifying subject. By contrast, individual nodes on the ...
In this paper we present an integrated approach for semantic structure extraction in document images. Document images are initially processed to extract both their layout and logic...
Proper display and accurate recognition of document images are often hampered by degradations caused by poor scanning or transmission conditions. We propose a method to enhance su...
Conventional document search techniques are constrained by attempting to match individual keywords or phrases to source documents. Thus, these techniques miss out documents that co...