We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the stru...
Author identification is a text categorization task with applications in intelligence, criminal law, computer forensics, etc. Usually, in such cases there is shortage of training t...
We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any k...
Edleno Silva de Moura, Gonzalo Navarro, Nivio Zivi...
This paper presents a method for incorporating natural language processing into existing text categorization procedures. Three aspects are considered in the investigation: (i) a m...
Previous work on Natural Language Processing for Information Retrieval has shown the inadequateness of semantic and syntactic structures for both document retrieval and categoriza...