Abstract. This tutorial describes the rationale for and use of an ontology for extensive semantic interrelation of documents to increase "sustainable development", i.e. c...
This work is in the domain of Electronic Document Management (EDM) [1]. The documents can be an electronic writing, an image, a sound file, a network protocol message, a set of da...
Cross Language Information Retrieval community has brought up search engines over multilingual corpora, and multilingual text categorization systems. In this paper, we focus on th...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical patt...