To circumvent prevalent text-based anti-spam filters, spammers have begun embedding the advertisement text in images. Analogously, proprietary information (such as source code) may be communicated as screenshots to defeat text-based monitoring of outbound e-mail. The proposed method separates spam images from other common categories of e-mail images based on extracted overlay text and color features. No expensive OCR processing is necessary. Our method works robustly in spite of complex backgrounds, compression artifacts, and a wide variety of formats and fonts of overlaid spam text. It is also demonstrated successfully to detect screenshots in outbound e-mail.
Hrishikesh Aradhye, Gregory K. Myers, James A. Her