Sciweavers

CIKM
2009
Springer

Space-economical partial gram indices for exact substring matching

14 years 4 months ago
Space-economical partial gram indices for exact substring matching
Exact substring matching queries on large data collections can be answered using q-gram indices, that store for each occurring q-byte pattern an (ordered) posting list with the positions of all occurrences. Such gram indices are known to provide fast query response time and to allow the index to be created quickly even on huge disk-based datasets. Their main drawback is relatively large storage space, that is a constant multiple (typically > 2) of the original data size, even when compression is used. In this work, we study methods to conserve the scalable creation time and efficient exact substring query properties of gram indices, while reducing storage space. To this end, we first propose a partial gram index based on a reduction from the problem of omitting indexed q-grams to the set cover problem. While this method is successful in reducing the size of the index, it generates false positives at query time, reducing efficiency. We then increase the accuracy of partial grams by...
Nan Tang, Lefteris Sidirourgos, Peter A. Boncz
Added 24 Jul 2010
Updated 24 Jul 2010
Type Conference
Year 2009
Where CIKM
Authors Nan Tang, Lefteris Sidirourgos, Peter A. Boncz
Comments (0)