Sciweavers

COLING
2010

Simple and Efficient Algorithm for Approximate Dictionary Matching

13 years 6 months ago
Simple and Efficient Algorithm for Approximate Dictionary Matching
This paper presents a simple and efficient algorithm for approximate dictionary matching designed for similarity measures such as cosine, Dice, Jaccard, and overlap coefficients. We propose this algorithm, called CPMerge, for the overlap join of inverted lists. First we show that this task is solvable exactly by a -overlap join. Given inverted lists retrieved for a query, the algorithm collects fewer candidate strings and prunes unlikely candidates to efficiently find strings that satisfy the constraint of the -overlap join. We conducted experiments of approximate dictionary matching on three large-scale datasets that include person names, biomedical names, and general English words. The algorithm exhibited scalable performance on the datasets.
Naoaki Okazaki, Jun-ichi Tsujii
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Naoaki Okazaki, Jun-ichi Tsujii
Comments (0)