Sciweavers

ICDE
2009
IEEE

Space-Constrained Gram-Based Indexing for Efficient Approximate String Search

15 years 1 months ago
Space-Constrained Gram-Based Indexing for Efficient Approximate String Search
Abstract-- Answering approximate queries on string collections is important in applications such as data cleaning, query relaxation, and spell checking, where inconsistencies and errors exist in user queries as well as data. Many existing algorithms use gram-based inverted-list indexing structures to answer approximate string queries. These indexing structures are "notoriously" large compared to the size of their original string collection. In this paper, we study how to reduce the size of such an indexing structure to a given amount of space, while retaining efficient query processing. We first study how to adopt existing inverted-list compression techniques to solve our problem. Then, we propose two novel approaches for achieving the goal: one is based on discarding gram lists, and one is based on combining correlated lists. They are both orthogonal to existing compression techniques, exploit a unique property of our setting, and offer new opportunities for improving query ...
Alexander Behm, Shengyue Ji, Chen Li, Jiaheng Lu
Added 20 Oct 2009
Updated 20 Oct 2009
Type Conference
Year 2009
Where ICDE
Authors Alexander Behm, Shengyue Ji, Chen Li, Jiaheng Lu
Comments (0)