Semistatic word-based byte-oriented compression codes are known to be attractive alternatives to compress natural language texts. With compression ratios around 30%, they allow di...
Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade fi...
Andrej Bratko, Gordon V. Cormack, Bogdan Filipic, ...
We address the problem of approximate string matching in two dimensions, that is, to nd a pattern of size m m in a text of size n n with at most k errors (substitutions, insertions...
Lossless compression researchers have developed highly sophisticated approaches, such as Huffman encoding, arithmetic encoding, the Lempel-Ziv family, Dynamic Markov Compression (D...
Fauzia S. Awan, Nan Zhang 0005, Nitin Motgi, Raja ...
Classification of texts potentially containing a complex and specific terminology requires the use of learning methods that do not rely on extensive feature engineering. In this w...