Sciweavers

TKDE
2012

Data Cube Materialization and Mining over MapReduce

12 years 1 months ago
Data Cube Materialization and Mining over MapReduce
—Computing interesting measures for data cubes and subsequent mining of interesting cube groups over massive datasets are critical for many important analyses done in the real world. Previous studies have focused on algebraic measures such as SUM that are amenable to parallel computation and can easily benefit from the recent advancement of parallel computing infrastructure such as MapReduce. Dealing with holistic measures such as TOP-K, however, is non-trivial. In this paper we detail real-world challenges in cube materialization and mining tasks on Web-scale datasets. Specifically, we identify an important subset of holistic measures and introduce MR-Cube, a MapReduce based framework for efficient cube computation and identification of interesting cube groups on holistic measures. We provide extensive experimental analyses over both real and synthetic data. We demonstrate that, unlike existing techniques which cannot scale to the 100 million tuple mark for our datasets, MR-Cube...
Arnab Nandi, Cong Yu, Philip Bohannon, Raghu Ramak
Added 29 Sep 2012
Updated 29 Sep 2012
Type Journal
Year 2012
Where TKDE
Authors Arnab Nandi, Cong Yu, Philip Bohannon, Raghu Ramakrishnan
Comments (0)