Information available in the Internet is frequently supplied simply as plain ascii text, structured according to orthographic and semantic conventions. Traditional document classi...
In Africa, there are a number of languages with their own indigenous scripts. This paper presents an OCR for Amharic scripts. Amharic is the official and working language of Ethio...
GOOD is a tailor-made, fully integrated publishing system that creates output documents for multiple media types used in both online and offline teaching modes at the University of...
Jacek Radajewski, Sally MacFarlane, Stijn Dekeyser
This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substit...
(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical patt...