Fast Uncertainty Sampling for Labeling Large E-mail Corpora

14 years 4 months ago

Download www.ceas.cc

One of the biggest challenges in building effective anti-spam solutions is designing systems to defend against the everevolving bag of tricks spammers use to defeat them. Because of this, spam filters that work well today may not work well tomorrow. The adversarial nature of the spam problem makes large, up-to-date, and diverse e-mail corpora critical for the development and evaluation of new anti-spam filtering technologies. Gathering large collections of messages can actually be quite easy, especially in the context of a large, corporate or ISP environment. The challenge is not necessarily in collecting enough mail, however, but in collecting a representative distribution of mail types as seen "in the wild" and in then accurately labeling the hundreds of thousands or millions of accumulated messages as spam or non-spam. In the field of machine learning Uncertainty Sampling is a well-known Active Learning algorithm which uses a collaborative model to minimize the human effo...

Richard Segal, Ted Markowitz, William Arnold

Real-time Traffic

Approximate Uncertainty | CEAS 2006 | Conventional Uncertainty Sampling | Internet Technology | Uncertainty Sampling |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	CEAS
Authors	Richard Segal, Ted Markowitz, William Arnold

Comments (0)

Sciweavers

Fast Uncertainty Sampling for Labeling Large E-mail Corpora

Approximate Uncertainty | CEAS 2006 | Conventional Uncertainty Sampling | Internet Technology | Uncertainty Sampling |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers