Querying uncertain data has emerged as an important problem in data management due to the imprecise nature of many measurement data. In this paper we study answering range queries over uncertain data. Specifically, we are given a collection P of n points in R, each represented by its one-dimensional probability density function (pdf). The goal is to build an index on P such that given a query interval I and a probability threshold , we can quickly report all points of P that lie in I with probability at least . We present various indexing schemes with linear or near-linear space and logarithmic query time. Our schemes support pdf's that are either histograms or more complex ones such as Gaussian or piecewise algebraic. They also extend to the external memory model in which the goal is to minimize the number of disk accesses when querying the index. Categories and Subject Descriptors F.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems; H.3.1 [...
Pankaj K. Agarwal, Siu-Wing Cheng, Yufei Tao, Ke Y