Sciweavers

DBPL
2009
Springer

General Database Statistics Using Entropy Maximization

14 years 5 months ago
General Database Statistics Using Entropy Maximization
Abstract. We propose a framework in which query sizes can be estimated from arbitrary statistical assertions on the data. In its most general form, a statistical assertion states that the size of the output of a conjunctive query over the data is a given number. A very simple example is a histogram, which makes assertions about the sizes of the output of several range queries. Our model also allows much more complex assertions that include joins and projections. To model such complex statistical assertions we propose to use the Entropy-Maximization (EM) probability distribution. In this model any set of statistics that is consistent has a precise semantics, and every query has an precise size estimate. We show that several classes of statistics can be solved in closed form.
Raghav Kaushik, Christopher Ré, Dan Suciu
Added 26 May 2010
Updated 26 May 2010
Type Conference
Year 2009
Where DBPL
Authors Raghav Kaushik, Christopher Ré, Dan Suciu
Comments (0)