Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

15 years 1 months ago

Download www.db-infotech.cn

Approximate queries on a collection of strings are important in many applications such as record linkage, spell checking, and Web search, where inconsistencies and errors exist in data as well as queries. Several existing algorithms use the concept of "grams," which are substrings of strings used as signatures for the strings to build index structures. A recently proposed technique, called VGRAM, improves the performance of these algorithms by using a carefully chosen dictionary of variable-length grams based on their frequencies in the string collection. Since an index structure using fixed-length grams can be viewed as a special case of VGRAM, a fundamental problem arises naturally: what is the relationship between the gram dictionary and the performance of queries? We study this problem in this paper. We propose a dynamic programming algorithm for computing a tight lower bound on the number of common grams shared by two similar strings in order to improve query performanc...

Xiaochun Yang, Bin Wang, Chen Li

Real-time Traffic

Approximate Queries | Approximate String Queries | Database | SIGMOD 2008 | Variable-length Grams |

claim paper

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2008
Where	SIGMOD
Authors	Xiaochun Yang, Bin Wang, Chen Li

Comments (0)

Sciweavers

Cost-based variable-length-gram selection for string collections to support approximate queries efficiently

Approximate Queries | Approximate String Queries | Database | SIGMOD 2008 | Variable-length Grams |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers