We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
This article provides description about the grammar checking system developed for detecting various grammatical errors in Punjabi texts. This system utilizes a fullform lexicon fo...
This paper presents a novel approach for the multi-oriented text line extraction from historical handwritten Arabic documents. Because of the multi-orientation of lines and their ...
(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical patt...
As large quantity of document images is getting archived by the digital libraries, there is a need for an efficient search strategies to make them available as per users informatio...