Sciweavers

CEAS
2007
Springer

Filtering Image Spam with Near-Duplicate Detection

14 years 6 months ago
Filtering Image Spam with Near-Duplicate Detection
A new trend in email spam is the emergence of image spam. Although current anti-spam technologies are quite successful in filtering text-based spam emails, the new image spams are substantially more difficult to detect, as they employ a variety of image creation and randomization algorithms. Spam image creation algorithms are designed to defeat well-known vision algorithms such as optical character recognition (OCR) algorithms whereas randomization techniques ensure the uniqueness of each image. We observe that image spam is often sent in batches that consist of visually similar images that differ only due to the application of randomization algorithms. Based on this observation, we propose an image spam detection system that uses near-duplicate detection to detect spam images. We rely on traditional anti-spam methods to detect a subset of spam images and then use multiple image spam filters to detect all the spam images that “look” like the spam caught by traditional methods. ...
Zhe Wang, William K. Josephson, Qin Lv, Moses Char
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CEAS
Authors Zhe Wang, William K. Josephson, Qin Lv, Moses Charikar, Kai Li
Comments (0)