Sciweavers

PKDD
2004
Springer

Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering

14 years 4 months ago
Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering
Spam filtering is a text categorization task that has attracted significant attention due to the increasingly huge amounts of junk email on the Internet. While current best-practice systems use Naive Bayes filtering and other probabilistic methods, we propose using a statistical, but non-probabilistic classifier based on the Winnow algorithm. The feature space considered by most current methods is either limited in expressivity or imposes a large computational cost. We introduce orthogonal sparse bigrams (OSB) as a feature combination technique that overcomes both these weaknesses. By combining Winnow and OSB with refined preprocessing and tokenization techniques we are able to reach an accuracy of 99.68% on a difficult test corpus, compared to 98.88% previously reported by the CRM114 classifier on the same test corpus.
Christian Siefkes, Fidelis Assis, Shalendra Chhabr
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where PKDD
Authors Christian Siefkes, Fidelis Assis, Shalendra Chhabra, William S. Yerazunis
Comments (0)