Social media are becoming increasingly popular and have attracted considerable attention from spammers. Using a sample of more than ninety thousand known spam Web sites, we found between 7% to 18% of their URLs are posted on two popular social media Web sites, digg.com and delicious.com. In this paper, we present a co-classification framework to detect Web spam and the spammers who are responsible for posting them on the social media Web sites. The rationale for our approach is that since both detection tasks are related, it would be advantageous to train them simultaneously to make use of the labeled examples in the Web spam and spammer training data. We have evaluated the effectiveness of our algorithm on the delicious.com data set. Our experimental results showed that the proposed co-classification algorithm significantly outperforms classifiers that learn each detection task independently. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Se...
Feilong Chen, Pang-Ning Tan, Anil K. Jain