We developed a prototype for integrated retrieval and aggregation of diverse information contained in scanned paper documents. Such complex document information processing combines several forms of image processing together with textual/linguistic processing to enable effective analysis of complex document collections, a necessity for a wide range of applications. This is the first system to attempt integrated retrieval from complex documents; we report its current capabilities. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval Keywords CDIP, Document image processing, Integrated retrieval
Shlomo Argamon, Gady Agam, Ophir Frieder, David A.