NetCube: A Scalable Tool for Fast Data Mining and Compression

15 years 11 months ago

Download www.vldb.org

We propose an novel method of computing and storing DataCubes. Our idea is to use Bayesian Networks, which can generate approximate counts for any query combination of attribute values and “don’t cares.” A Bayesian network represents the underlying joint probability distribution of the data that were used to generate it. By means of such a network the proposed method, NetCube, exploits correlations among attributes. Our proposed preprocessing algorithm scales linearly on the size of the database, and is thus scalable; it is also parallelizable with a straightforward parallel implementation. Moreover, we give an algorithm to estimate counts of arbitrary queries that is fast (constant on the database size). Experimental results show that NetCubes have fast generation and use (a few This material is based upon work supported by the National Science Foundation under Grants No. DMS-9873442,IIS-9817496, IIS-9910606, IIS-9988876, LIS 9720374, IIS-0083148, IIS-0113089, and by the Defe...

Dimitris Margaritis, Christos Faloutsos, Sebastian

Real-time Traffic

Bayesian Network | Defense Advanced Research Projects Agency | National Science Foundation | VLDB 2001 |

claim paper

» Mining compressed commodity workflows from massive RFID data sets

» Fast Scalable Disk Imaging with Frisbee

» ShatterPlots Fast Tools for Mining Large Graphs

» Interactive exploration of coherent patterns in timeseries gene expression data

Post Info
More Details (n/a)

Added	30 Jul 2010
Updated	30 Jul 2010
Type	Conference
Year	2001
Where	VLDB
Authors	Dimitris Margaritis, Christos Faloutsos, Sebastian Thrun

Comments (0)

Sciweavers

NetCube: A Scalable Tool for Fast Data Mining and Compression

Bayesian Network | Defense Advanced Research Projects Agency | National Science Foundation | VLDB 2001 |

Explore & Download

Productivity Tools

Sciweavers