Finding Associations and Computing Similarity via Biased Pair Sampling

16 years 1 months ago

Download www.itu.dk

Sampling-based methods have previously been proposed for the problem of ﬁnding interesting associations in data, even for low-support items. While these methods do not guarantee precise results, they can be vastly more eﬃcient than approaches that rely on exact counting. However, for many similarity measures no such methods have been known. In this paper we show how a wide variety of measures can be supported by a simple biased sampling method. The method also extends to ﬁnd high-conﬁdence association rules. We demonstrate theoretically that our method is superior to exact methods when the threshold for “interesting similarity/conﬁdence” is above the average pairwise similarity/conﬁdence, and the average support is not too low. Our method is particularly advantageous when transactions contain many items. We conﬁrm in experiments on standard association mining benchmarks that we obtain a signiﬁcant speedup on real data sets. Reductions in computation time of over an ...

Andrea Campagna, Rasmus Pagh

Real-time Traffic

Average Pairwise Similarity/conﬁdence | Biased Sampling Method | Data Mining | ICDM 2009 | Sampling-based Methods |

claim paper

Added	23 May 2010
Updated	23 May 2010
Type	Conference
Year	2009
Where	ICDM
Authors	Andrea Campagna, Rasmus Pagh

Sciweavers

Finding Associations and Computing Similarity via Biased Pair Sampling

Average Pairwise Similarity/conﬁdence | Biased Sampling Method | Data Mining | ICDM 2009 | Sampling-based Methods |

Explore & Download

Productivity Tools

Sciweavers