Recent email spam filtering evaluations, such as those conducted at TREC, have shown that near-perfect filtering results are attained with a variety of machine learning methods wh...
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closel...
Unsolicited commercial email is a significant problem for users and providers of email services. While statistical spam filters have proven useful, senders of spam are learning ...
Most classification methods are based on the assumption that the data conforms to a stationary distribution. However, the real-world data is usually collected over certain periods...
Recently, spammers have proliferated "image spam", emails which contain the text of the spam message in a human readable image instead of the message body, making detect...