Attribute value reordering for efficient hybrid OLAP

15 years 11 months ago

Download www.daniel-lemire.com

The normalization of a data cube is the process of choosing an ordering for the attribute values, and the chosen ordering will affect the physical storage of the cube’s data. For large multidimensional arrays, proper normalization can lead to more efﬁcient storage in hybrid OLAP contexts that store dense and sparse chunks differently. We show that it is NP-hard to compute an optimal normalization even for 1 × 3 chunks, although we ﬁnd an exact algorithm for 1 × 2 chunks. When attributes are nearly statistically independent, we show that an optimal normalization is given by dimension-wise attribute frequency sorting, which can be done in time O(dnlog(n)) for data cubes of size nd. When attributes are not independent, we propose and evaluate a number of heuristics. Our optimized hybrid OLAP storage mechanism was observed to be 44% more storage efﬁcient than ROLAP and the gains due to normalization alone accounted for 45% of this increase in efﬁciency. Data Cubes, Normalizatio...

Owen Kaser, Daniel Lemire

Real-time Traffic