The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but can also cause query optimizers--which usually assume that columns are statistically independent--to underestimate the selectivities of conjunctive predicates by orders of magnitude. We introduce cords, an efficient and scalable tool for automatic discovery of correlations and soft functional dependencies between columns. cords searches for column pairs that might have interesting and useful dependency relations by systematically enumerating candidate pairs and simultaneously pruning unpromising candidates using a flexible set of heuristics. A robust chi-squared analysis is applied to a sample of column values in order to identify correlations, and the number of distinct values in the sampled columns is analyzed to detect soft functional dependencies. cords can be used as a data mining tool, producing dependency graphs that are of intrinsic interest. We focus ...
Ihab F. Ilyas, Volker Markl, Peter J. Haas, Paul B