Abstract We present a fast compression and decompression scheme for natural language texts that allows e cient and exible string matching by searching the compressed text directly....
Edleno Silva de Moura, Gonzalo Navarro, Nivio Zivi...
Automated text categorization is an important technique for many web applications, such as document indexing, document filtering, and cataloging web resources. Many different appr...
The selection of indexing terms for representing documents is a key decision that limits how effective subsequent retrieval can be. Often stemming algorithms are used to normaliz...
Web search engines consistently collect information about users interaction with the system: they record the query they issued, the URL of presented and selected documents along w...
Background: We present a probabilistic topic-based model for content similarity called pmra that underlies the related article search feature in PubMed. Whether or not a document ...