Sciweavers

CEAS
2004
Springer

The Impact of Feature Selection on Signature-Driven Spam Detection

14 years 4 months ago
The Impact of Feature Selection on Signature-Driven Spam Detection
Signature-driven spam detection provides an alternative to machine learning approaches and can be very effective when near-duplicates of essentially the same message are sent in high volume [20]. Unfortunately, signatures can also be brittle to small alterations of message content. In this work we propose a technique for increasing signature robustness, targeting the I-Match algorithm [6], but applicable to other single-signature detection schemes. The proposed method is shown to consistently outperform traditional I-Match in the spam filtering application. As I-Match signature quality and stability depend on vocabulary control, we compare the traditional Zipfian approaches to feature selection with techniques applied typically in text categorization, which are found to provide viable alternatives. In particular, distributional word clustering is demonstrated to be effective in increasing signature robustness.
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspecto
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where CEAS
Authors Aleksander Kolcz, Abdur Chowdhury, Joshua Alspector
Comments (0)