Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade fi...
Andrej Bratko, Gordon V. Cormack, Bogdan Filipic, ...
Classification of texts potentially containing a complex and specific terminology requires the use of learning methods that do not rely on extensive feature engineering. In this w...
This paper improves the Tagged Suboptimal Codes (TSC) compression scheme in several ways. We show how to process the TSC as a universal code. We introduce the TSCk as a family of ...
—The Burrows-Wheeler Transform (BWT) is the basis for many of the most effective compression and selfindexing methods used today. A key to the versatility of the BWT is the abili...
Matthias Petri, Gonzalo Navarro, J. Shane Culpeppe...
Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and mach...
Dingding Wang, Tao Li, Shenghuo Zhu, Chris H. Q. D...