Fast discovery of unexpected patterns in data, relative to a Bayesian network

16 years 7 months ago

Download www.cs.umb.edu

We consider a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: our goal is to find the strongest discrepancies between network and database. This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database. A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. We give a rigorous proof of the method's correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases. Categories and Subject Descriptors H.2.8 [Database Management]: Database ApplicationsData Mining General Terms Algorithms, Experimentation, Performance Keywords Bayesian Networks, Association Rules, Sampling

Szymon Jaroszewicz, Tobias Scheffer

Real-time Traffic

Data Mining | Interesting Unexpected Patterns | KDD 2005 | Keywords Bayesian Networks | Large-scale Bayesian Networks |

claim paper

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2005
Where	KDD
Authors	Szymon Jaroszewicz, Tobias Scheffer

Sciweavers

Fast discovery of unexpected patterns in data, relative to a Bayesian network

Data Mining | Interesting Unexpected Patterns | KDD 2005 | Keywords Bayesian Networks | Large-scale Bayesian Networks |

Explore & Download

Productivity Tools

Sciweavers