Sciweavers

KDD
2004
ACM

Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pair

15 years 25 days ago
Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pair
Given a user-specified minimum correlation threshold and a market basket database with N items and T transactions, an all-strong-pairs correlation query finds all item pairs with correlations above the threshold . However, when the number of items and transactions are large, the computation cost of this query can be very high. In this paper, we identify an upper bound of Pearson's correlation coefficient for binary variables. This upper bound is not only much cheaper to compute than Pearson's correlation coefficient but also exhibits a special monotone property which allows pruning of many item pairs even without computing their upper bounds. A Two-step All-strong-Pairs corrElation queRy (TAPER) algorithm is proposed to exploit these properties in a filter-and-refine manner. Furthermore, we provide an algebraic cost model which shows that the computation savings from pruning is independent or improves when the number of items is increased in data sets with common Zipf or li...
Hui Xiong, Shashi Shekhar, Pang-Ning Tan, Vipin Ku
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2004
Where KDD
Authors Hui Xiong, Shashi Shekhar, Pang-Ning Tan, Vipin Kumar
Comments (0)