Sciweavers

VLDB
2007
ACM

Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance

14 years 5 months ago
Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance
There are many emerging database applications that require accurate selectivity estimation of approximate string matching queries. Edit distance is one of the most commonly used string similarity measures. In this paper, we study the problem of estimating selectivity of string matching with low edit distance. Our framework is based on extending q-grams with wildcards. Based on the concepts of replacement semilattice, string hierarchy and a combinatorial analysis, we develop the formulas for selectivity estimation and provide the algorithm BasicEQ. We next develop the algorithm OptEQ by enhancing BasicEQ with two novel improvements. Finally we show a comprehensive set of experiments using three benchmarks comparing OptEQ with the stateof-the-art method SEPIA. Our experimental results show that OptEQ delivers more accurate selectivity estimations.
Hongrae Lee, Raymond T. Ng, Kyuseok Shim
Added 09 Jun 2010
Updated 09 Jun 2010
Type Conference
Year 2007
Where VLDB
Authors Hongrae Lee, Raymond T. Ng, Kyuseok Shim
Comments (0)