Cloaking and redirection are two possible search engine spamming techniques. In order to understand cloaking and redirection on the Web, we downloaded two sets of Web pages while mimicking a popular Web crawler and as a common Web browser. We estimate that 3% of the first data set and 9% of the second data set utilize cloaking of some kind. By checking manually a sample of the cloaking pages from the second data set, nearly one third of them appear to aim to manipulate search engine ranking. We also examined redirection methods present in the first data set. We propose a method of detecting cloaking pages by calculating the difference of three copies of the same page. We examine the different types of cloaking that are found and the distribution of different types of redirection.
Baoning Wu, Brian D. Davison