We present a novel system for automatically marking up text documents into XML and discuss the benefits of XML markup for intelligent information retrieval. The system uses the Se...
This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for ...
Text extraction in mixed-type documents is a pre-processing and necessary stage for many document applications. In mixed-type color documents, text, drawings and graphics appear w...