Motivated by the increasing need to analyze complex, uncertain multidimensional data this paper proposes probabilistic OLAP queries that are computed using probability distributions rather than atomic values. The paper describes how to create probability distributions from base data, and how the distributions can be subsequently used in pre-aggregation. Since the probability distributions can become large, we show how to achieve good time and space efficiency by approximating the distributions. We present the results of several experiments that demonstrate the effectiveness of our methods. The work is motivated with a real-world case study, based on our collaboration with a leading Danish vendor of location-based services. This paper is the first to consider the approximate processing of probabilistic OLAP queries over probability distributions. Categories and Subject Descriptors: H.2 [Information Systems]: Database Management; H.2.8 [Database Management]: Database Applications — ...
Igor Timko, Curtis E. Dyreson, Torben Bach Pederse