When query optimizers erroneously assume that database columns are statistically independent, they can underestimate the selectivities of conjunctive predicates by orders of magnitude. Such underestimation often leads to drastically suboptimal query execution plans. We demonstrate cords, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between column pairs. We apply cords to real, synthetic, and TPC-H benchmark data, and show that cords discovers correlations in an efficient and scalable manner. The output of cords can be visualized graphically, making cords a useful mining and analysis tool for database administrators. cords ranks the discovered correlated column pairs and recommends to the optimizer a set of statistics to collect for the “most important” of the pairs. Use of these statistics speeds up processing times by orders of magnitude for a wide range of queries.
Ihab F. Ilyas, Volker Markl, Peter J. Haas, Paul G