Sciweavers

FUIN
2006

On-line Approximate String Matching in Natural Language

13 years 11 months ago
On-line Approximate String Matching in Natural Language
We consider approximate pattern matching in natural language text. We use the words of the text as the alphabet, instead of the characters as in traditional string matching approaches. Hence our pattern consists of a sequence of words. From the algorithmic point of view this has several advantages: (i) the number of words is much less than the number of characters, which in effect means shorter text (less possible matching positions); (ii) the pattern effectively becomes shorter, so bit-parallel techniques become more applicable; (iii) the alphabet size becomes much larger, so the probability that two symbols (in this case, words) match is reduced. We extend several known approximate string matching algorithms for this scenario, allowing k insertions, deletions or substitutions of symbols (natural language words). We further extend the algorithms to allow k errors inside the pattern symbols (words) as well. The two error thresholds k and k can be applied simultaneously and independentl...
Kimmo Fredriksson
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2006
Where FUIN
Authors Kimmo Fredriksson
Comments (0)