VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams

16 years 7 months ago

Download www.ics.uci.edu

Many applications need to solve the following problem of approximate string matching: from a collection of strings, how to find those similar to a given string, or the strings in another (possibly the same) collection of strings? Many algorithms are developed using fixed-length grams, which are substrings of a string used as signatures to identify similar strings. In this paper we develop a novel technique, called VGRAM, to improve the performance of these algorithms. Its main idea is to judiciously choose high-quality grams of variable lengths from a collection of strings to support queries on the collection. We give a full specification of this technique, including how to select high-quality grams from the collection, how to generate variable-length grams for a string based on the preselected grams, and what is the relationship between the similarity of the gram sets of two strings and their edit distance. A primary advantage of the technique is that it can be adopted by a plethora ...

Chen Li, Bin Wang, Xiaochun Yang

Real-time Traffic

Database | VLDB 2007 |

claim paper

» SpaceConstrained GramBased Indexing for Efficient Approximate String Search

» Efficient Merging and Filtering Algorithms for Approximate String Searches

Post Info
More Details (n/a)

Added	05 Dec 2009
Updated	05 Dec 2009
Type	Conference
Year	2007
Where	VLDB
Authors	Chen Li, Bin Wang, Xiaochun Yang

Comments (0)

Sciweavers

VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams

Database | VLDB 2007 |

Explore & Download

Productivity Tools

Sciweavers