In this paper, we present a new spam filter which acts as an additional layer in the spam filtering process. This filter is based on what we call a representative vocabulary. Spam e-mails are divided into categories in which each category is represented by a set of tokens which form a Representative Text (RT). Tokens are strings of characters (words, sentences, or some times meaningless strings of characters). This RT is used to compute a resemblance ratio with incoming e-mails. With this ratio we decide whether the incoming e-mail is a spam. This filter was implemented and integrated to Spamihilator software. Some experimental and interesting results will be presented.
L. Pelletier, Jalal Almhana, Vartan Choulakian