Query optimization is an integral part of relational database management systems. One important task in query optimization is selectivity estimation, that is, given a query P, we need to estimate the fraction of records in the database that satisfy P. Almost all previous work dealt with the estimation of numeric selectivity, i.e., the query contains only numeric variables. The general problem of estimating alphanumeric selectivity is much more difficult and has attracted attention only very recently, and the focus has been on the special case when only one column is involved. In this paper, we consider the more general case when there are two correlated alphanumeric columns. We develop efficient algorithms to build storage structures that can fit in a database catalog. Results from our extensive experiments to test our algorithms, on the basis of error analysis and space requirements, are given to guide DBMS implementors.
Min Wang, Jeffrey Scott Vitter, Balakrishna R. Iye