—Forums on the Web are increasingly spammed by miscreants in order to attract visitors to their (often malicious) websites. In this paper, we study the prevalence of forum spamming and find that Internet users are at a high risk of encountering forums with spam links posted on them. To mitigate the problem, we examine the characteristics of 286 days of forum spam posted at a research blog and develop light-weight features based on spammers’ IP, commenting activity and the anatomy of their posts. We find that an SVM classifier trained on these features can achieve a 99.81% precision and 92.82% recall in identifying forum spam.
Youngsang Shin, Minaxi Gupta, Steven A. Myers