Spam filters: bayes vs. chi-squared; letters vs. words

15 years 1 months ago

Download www.searchforum.org.cn

We compare two statistical methods for identifying spam or junk electronic mail. Spam ﬁlters are classiﬁers which determine whether an email is junk or not. The proliferation of spam email has made electronic ﬁltering vitally important. The magnitude of the problem is discussed. We examine the Naive Bayesian method in relation to the ‘Chi by degrees of Freedom’ approach, the latter used in the ﬁeld of authorship identiﬁcation. Both methods produce very promising results. However, the ‘Chi by degrees of Freedom’ has the advantage of providing signiﬁcance measures, which will help to reduce false positives. Statistics based on character-level tokenization proves more effective than word-level.

Cormac O'Brien, Carl Vogel

Real-time Traffic

Information Technology | ISICT 2003 | Junk Electronic Mail | Naive Bayesian Method | Spam ﬁlters |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	ISICT
Authors	Cormac O'Brien, Carl Vogel

Comments (0)

Sciweavers

Spam filters: bayes vs. chi-squared; letters vs. words

Information Technology | ISICT 2003 | Junk Electronic Mail | Naive Bayesian Method | Spam ﬁlters |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers