In the traditional setting, text categorization is formulated as a concept learning problem where each instance is a single isolated document. However, this perspective is not appr...
Metadata plays an important role in discovering, collecting, extracting and aggregating Web data. This paper proposes a method of constructing metadata for a specific topic. The m...
Many old manuscript documents were written on both sides of the paper, and the bleed-through from one side of the document to the other increases the difficulty in reading or deci...
We present methods for eliminating or reducing the distortion in a scanned image. Aspects of the present paper allow for the automatic pruning, de-skewing, and unwarping of an ima...
We present a method for picture detection in document page images, which can come from scanned or camera images, or rendered from electronic file formats. Our method uses OCR to s...