Sciweavers

CEAS
2006
Springer

An Adaptive, Semi-Structured Language Model Approach to Spam Filtering on a New Corpus

14 years 3 months ago
An Adaptive, Semi-Structured Language Model Approach to Spam Filtering on a New Corpus
Motivated by current efforts to construct more realistic spam filtering experimental corpora, we present a newly assembled, publicly available corpus of genuine and unsolicited (spam) email, dubbed GenSpam. We also propose an adaptive model for semi-structured document classification based on language model component interpolation. We compare this with a number of alternative classification models, and report promising results on the spam filtering task using a specifically assembled test set to be released as part of the GenSpam corpus.
Ben Medlock
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CEAS
Authors Ben Medlock
Comments (0)