With the explosive growth of the Internet, e-mails are regarded as one of the most important methods to send e-mails as a substitute for traditional communications. As e-mail has become a major mean of communication in the Internet age, exponentially growing spam mails have been raised as a main problem. As a result of this problem, researchers have suggested many methodologies to solve it. Especially, Bayesian classifier-based systems show high performances to filter spam mail and many commercial products available. However, they have several problems. First, it has a cold start problem, that is, training phase has to be done before execution of the system. The system must be trained about spam and non-spam mail. Second, its cost for filtering spam mail is higher than rule-based systems. Last problem, we focus on, is that the filtering performance is decreased when E-mail has only a few terms which represent its contents. To solve this problem, we suggest spam mail filtering system us...
Hyun-Jun Kim, Heung-Nam Kim, Jason J. Jung, GeunSi