In this paper we analyze a very large junk e-mail corpus which was generated by a hundred thousand volunteer users of the Hotmail e-mail service. We describe how the corpus is being collected, and analyze: the geographic origins of the e-mail; who the e-mail is targeting; and what the e-mail is selling. Categories and Subject Descriptors K.4.1 [Computers and Society]: Public Policy Issues ? abuse and crime involving computers, transborder data flow, privacy. General Terms Measurement, Economics, Legal Aspects. Keywords Junk E-mail, spam, international e-mail.
Geoff Hulten, Joshua T. Goodman, Robert Rounthwait