Space-economical partial gram indices for exact substring matching

15 years 11 months ago

Download homepages.cwi.nl

Exact substring matching queries on large data collections can be answered using q-gram indices, that store for each occurring q-byte pattern an (ordered) posting list with the positions of all occurrences. Such gram indices are known to provide fast query response time and to allow the index to be created quickly even on huge disk-based datasets. Their main drawback is relatively large storage space, that is a constant multiple (typically > 2) of the original data size, even when compression is used. In this work, we study methods to conserve the scalable creation time and eﬃcient exact substring query properties of gram indices, while reducing storage space. To this end, we ﬁrst propose a partial gram index based on a reduction from the problem of omitting indexed q-grams to the set cover problem. While this method is successful in reducing the size of the index, it generates false positives at query time, reducing eﬃciency. We then increase the accuracy of partial grams by...

Nan Tang, Lefteris Sidirourgos, Peter A. Boncz

Real-time Traffic

CIKM 2009 | Exact Substring | Gram Indices | Such Gram Indices |

claim paper

Post Info
More Details (n/a)

Added	24 Jul 2010
Updated	24 Jul 2010
Type	Conference
Year	2009
Where	CIKM
Authors	Nan Tang, Lefteris Sidirourgos, Peter A. Boncz

Comments (0)

Sciweavers

Space-economical partial gram indices for exact substring matching

CIKM 2009 | Exact Substring | Gram Indices | Such Gram Indices |

Explore & Download

Productivity Tools

Sciweavers