Indexing Text with Approximate q-Grams

15 years 11 months ago

Download www.dcc.uchile.cl

We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at ﬁxed intervals and stores their positions. At search time, part of the text is ﬁltered out by noticing that any occurrence of the pattern must be reﬂected in the presence of some text q-samples that match approximately inside the pattern. Hence the index points out the text areas that could contain occurrences and must be veriﬁed. The index parameters permit load balancing between ﬁltering and veriﬁcation work, and provide a compromise between the space requirement of the index and the error level for which the ﬁltration is still eﬃcient. We show experimentally that the index is competitive against others that take more space, being in fact the fastest choice for intermediate error levels, an area where no current index is useful. Key words: Approximate string matching, text databases, q-gram indices.

Gonzalo Navarro, Erkki Sutinen, Jani Tanninen, Jor

Real-time Traffic

Approximate String Matching | Combinatorics | CPM 2000 | Index Collects Text | Text Q-samples |

claim paper

» Indexing Variable Length Substrings for Exact and Approximate Matching

» Block Addressing Indices for Approximate Text Retrieval

» Finding Range Minima in the Middle Approximations and Applications

» Approximate String Matching with LempelZiv Compressed Indexes

» Metric Indexing for the Vector Model in Text Retrieval

» Straightforward Feature Selection for Scalable Latent Semantic Indexing

» Indexing text data under space constraints

» Performance Analysis of Distributed Architectures to Index One Terabyte of Text

Post Info
More Details (n/a)

Added	02 Aug 2010
Updated	02 Aug 2010
Type	Conference
Year	2000
Where	CPM
Authors	Gonzalo Navarro, Erkki Sutinen, Jani Tanninen, Jorma Tarhio

Comments (0)

Sciweavers

Indexing Text with Approximate q-Grams

Approximate String Matching | Combinatorics | CPM 2000 | Index Collects Text | Text Q-samples |

Explore & Download

Productivity Tools

Sciweavers