The Portable Document Format (PDF) is a page-oriented, graphically rich document format based on PostScript semantics. It is the file format underlying the Adobe
This paper proposes a new method for document transformation using OCR to generate various XML documents from printed documents. The proposed method adopts a hierarchical transfor...
Existing methods for single document summarization usually make use of only the information contained in the specified document. This paper proposes the technique of document expa...
In this paper, a new method for document images or photos binarization is presented. The method is simple, fast and robust and appropriate for normal as well as for special cases ...
This paper presents a pair of identification technique that automatically detect scripts and orientations of document images suffering from various types of document degradation. ...