Sciweavers

ICDE
2012
IEEE

Approximate String Membership Checking: A Multiple Filter, Optimization-Based Approach

12 years 1 months ago
Approximate String Membership Checking: A Multiple Filter, Optimization-Based Approach
— We consider the approximate string membership checking (ASMC) problem of extracting all the strings or substrings in a document that approximately match some string in a given dictionary. To solve this problem, the current state-ofart approach involves first applying an approximate, fast filter, then applying a more expensive exact verification algorithm to the strings that pass the filter. Correspondingly, many string filters have been proposed. We note that different filters are good at eliminating different strings, depending on the characteristics of the strings in both the documents and the dictionary. We suspect that no single filter will dominate all other filters everywhere. Given an ASMC problem instance and a set of string filters, we need to select the optimal filter to maximize the performance. Furthermore, in our experiments we found that in some cases a sequence of filters dominates any of the filters of the sequence in isolation, and that the best set of ...
Chong Sun, Jeffrey F. Naughton, Siddharth Barman
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where ICDE
Authors Chong Sun, Jeffrey F. Naughton, Siddharth Barman
Comments (0)