The problem of document categorization is considered. The set of domains and the keywords specific for these domains is supposed to be selected beforehand as initial data. We apply...
Mikhail Alexandrov, Alexander F. Gelbukh, George L...
The XML has undoubtedly become a standard for data representation and manipulation. But most of XML documents are still created without the respective description of their structu...
Traditional software process environment stores documents using either centralized or distributed approach. With the assistance of web agent, this paper presents a new document st...
generally meta-data, so that documents on any specific subject can be transparently retrieved. While quality control can in principle still rely on the traditional methods of peer-...
Document clustering techniques mostly rely on single term analysis of the document data set, such as the Vector Space Model. To better capture the structure of documents, the unde...
For documents with complex or atypical annotations, multihierarchical structures play the role of the document tree in traditional XML documents. We define a model of overlapping...
Abstract. Term weighting is one of the most important aspects of modern Web retrieval systems. The weight associated with a given term in a document shows the importance of the ter...
We consider the incremental validation of updates on XML documents. When a valid XML document (i.e., one satisfying some constraints) is updated, it has to be verified that the n...
As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retri...
Term-based representations of documents have found widespread use in information retrieval. However, one of the main shortcomings of such methods is that they largely disregard le...