

On the Weakenesses of Correlation

13 years 7 months ago
On the Weakenesses of Correlation
The correlation of the result lists provided by search engines is fundamental and it has deep and multidisciplinary ramifications. Here, we present automatic and unsupervised methods to assess whether or not search engines provide results that are comparable or correlated. We have two main contributions: First, we provide evidence that for more than 80% of the input queries —independently of their frequency— the two major search engines share only three or fewer URLs in their search results, leading to an increasing divergence. In this scenario (divergence), we show that even the most robust measures based on comparing lists is useless to apply; that is, the small contribution by too few common items will infer no confidence. Second, to overcome this problem, we propose the first content-based measures —i.e., direct comparison of the contents from search results; these measures are based on the Jaccard ratio and distribution similarity measures (CDF measures). We show that th...
Paolo D'Alberto, Ali Dasdan
Added 20 Aug 2011
Updated 20 Aug 2011
Type Journal
Year 2011
Where CORR
Authors Paolo D'Alberto, Ali Dasdan
Comments (0)