Email spam filters are commonly trained on a sample of spam and ham (non-spam) messages. We investigate the effect on filter performance of using samples of spam and ham messages sent months before those to be filtered. Our results show that filter performance deteriorates with the overall age of spam and ham samples, but at different rates. Spam and ham samples of different ages may be mixed to advantage, provided temporal cues are elided. Categories and Subject Descriptors: H.3.3 [Information Search and Retrieval]:information filtering General Terms: Experimentation, Measurement
Gordon V. Cormack, Jose-Marcio Martins da Cruz