Detecting spam web pages through content analysis

16 years 7 months ago

Download research.microsoft.com

In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engines, to drive traffic to certain pages for fun or profit. This paper considers some previously-undescribed techniques for automatically detecting spam pages, examines the effectiveness of these techniques in isolation and when aggregated using classification algorithms. When combined, our heuristics correctly identify 2,037 (86.2%) of the 2,364 spam pages (13.8%) in our judged collection of 17,168 pages, while misidentifying 526 spam and non-spam pages (3.1%). Categories and Subject Descriptors H.5.4 [Information Interfaces and Presentation]: Hypertext/Hypermedia; K.4.m [Computers and Society]: Miscellaneous; H.4.m [Information Systems]: Miscellaneous General Terms Measurement, Experimentation, Algorithms Keywords Web characterization, web pages, web spam, data mining

Alexandros Ntoulas, Marc Najork, Mark Manasse, Den

Real-time Traffic

Internet Technology | Non-spam Pages | Spam Pages | Web Pages | WWW 2006 |

claim paper

» Spam Damn Spam and Statistics Using Statistical Analysis to Locate Spam Web Pages

» Looking into the past to better classify web spam

» Detecting Comment Spam through Content Analysis

» Characterizing Web Spam Using Content and HTTP Session Analysis

» Graph regularization methods for Web spam detection

» Spam detection using web page content a new battleground

» Identifying web spam with user behavior analysis

» Link Spam Detection based on DBSpamClust with Fuzzy Cmeans Clustering

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2006
Where	WWW
Authors	Alexandros Ntoulas, Marc Najork, Mark Manasse, Dennis Fetterly

Comments (0)

Sciweavers

Detecting spam web pages through content analysis

Internet Technology | Non-spam Pages | Spam Pages | Web Pages | WWW 2006 |

Explore & Download

Productivity Tools

Sciweavers