In a variety of applications ranging from optimizing queries on alphanumeric attributes to providing approximate counts of documents containing several query terms, there is an increasing need to quickly and reliably estimate the number of strings tuples, documents, etc. matching a Boolean query. Boolean queries in this context consist of substring predicates composed using Boolean operators. While there has been some work in estimating the selectivity of substring queries, the more general problem of estimating the selectivity of Boolean queries over substring predicates has not been studied. Our approach is to extract selectivity estimates from relationships between the substring predicates of the Boolean query. However, storing the correlation between all possible predicates in order to provide an exact answer to such predicates is clearly infeasible, as there is a super-exponential number of possible combinations of these predicates. Instead, our novel idea is to capture correla...
Zhiyuan Chen, Flip Korn, Nick Koudas, S. Muthukris