Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering

16 years 27 days ago

Download www.cs.ucc.ie

In this paper, we compare case-based spam ﬁlters, focusing on their resilience to concept drift. In particular, we evaluate how to track concept drift using a case-based spam ﬁlter that uses a featurefree distance measure based on text compression. In our experiments, we compare two ways to normalise such a distance measure, ﬁnding that the one proposed in [1] performs better. We show that a policy as simple as retaining misclassiﬁed examples has a hugely beneﬁcial eﬀect on handling concept drift in spam but, on its own, it results in the case base growing by over 30%. We then compare two diﬀerent retention policies and two diﬀerent forgetting policies (one a form of instance selection, the other a form of instance weighting) and ﬁnd that they perform roughly as well as each other while keeping the case base size constant. Finally, we compare a feature-based textual case-based spam ﬁlter with our feature-free approach. In the face of concept drift, the feature-based...

Sarah Jane Delany, Derek G. Bridge

Real-time Traffic

Case Base | Case-based Spam ﬁlter | Concept Drift | ICCBR 2007 |

claim paper

Post Info
More Details (n/a)

Added	08 Jun 2010
Updated	08 Jun 2010
Type	Conference
Year	2007
Where	ICCBR
Authors	Sarah Jane Delany, Derek G. Bridge

Comments (0)

Sciweavers

Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering

Case Base | Case-based Spam ﬁlter | Concept Drift | ICCBR 2007 |

Explore & Download

Productivity Tools

Sciweavers