The problem of document categorization is considered. The set of domains and the keywords specific for these domains is supposed to be selected beforehand as initial data. We apply...
Mikhail Alexandrov, Alexander F. Gelbukh, George L...
In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. Two retrieva...
In this paper we describe a top-down approach to the segmentation and representation of documents containing tabular structures. Examples of these documents are invoices and techn...
Francesca Cesarini, Marco Gori, Simone Marinai, Gi...
In this paper we present and discuss a novel approach to modeling logical structures of documents, based on a statistical representation of patterns in a document class. An effic...
It has become very common in the current information society to talk about "open" and to use this term as a quality mark. Open standards, open source software, open arch...