Spam Filtering Using Statistical Data Compression Models

15 years 7 months ago

Download jmlr.csail.mit.edu

Spam filtering poses a special problem in text categorization, of which the defining characteristic is that filters face an active adversary, which constantly attempts to evade filtering. Since spam evolves continuously and most practical applications are based on online user feedback, the task calls for fast, incremental and robust learning algorithms. In this paper, we investigate a novel approach to spam filtering based on adaptive statistical data compression models. The nature of these models allows them to be employed as probabilistic text classifiers based on character-level or binary sequences. By modeling messages as sequences, tokenization and other error-prone preprocessing steps are omitted altogether, resulting in a method that is very robust. The models are also fast to construct and incrementally updateable. We evaluate the filtering performance of two different compression algorithms; dynamic Markov compression and prediction by partial matching. The results of our emp...

Andrej Bratko, Gordon V. Cormack, Bogdan Filipic,

Real-time Traffic

Compression Models | Data Compression Models | JMLR 2006 | Robust Learning Algorithms |

claim paper

» Spam decisions on gray email using personalized ontologies

» Filtering Spam in Social Tagging System with Dynamic Behavior Analysis

» On Attacking Statistical Spam Filters

» A Comparison of Event Models for Naive Bayes AntiSpam EMail Filtering

» Spam Email Filtering Using NetworkLevel Properties

» Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering

» Training on errors experiment to detect faultprone software modules by spam filter

» Partitioned logistic regression for spam filtering

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2006
Where	JMLR
Authors	Andrej Bratko, Gordon V. Cormack, Bogdan Filipic, Thomas R. Lynam, Blaz Zupan

Comments (0)

Sciweavers

Spam Filtering Using Statistical Data Compression Models

Compression Models | Data Compression Models | JMLR 2006 | Robust Learning Algorithms |

Explore & Download

Productivity Tools

Sciweavers