Building authoring applications is a tedious and complex task that requires a high programming effort. Document technologies, especially XML based ones, can help in reducing such ...
Statistical approaches to document indexing and retrieval date back to the beginning of automation. This paper considers early ideas, how they developed, their status now, and the...
Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a l...
User queries on extensible markup language (XML) documents are typically expressed as regular path expressions. A variety of indexing techniques for efficiently retrieving the re...
Automatic text categorization is a problem of automatically assigning text documents to predefined categories. In order to classify text documents, we must extract good features f...