Sciweavers

ICDE
2011
IEEE

Answering approximate string queries on large data sets using external memory

13 years 3 months ago
Answering approximate string queries on large data sets using external memory
— An approximate string query is to find from a collection of strings those that are similar to a given query string. Answering such queries is important in many applications such as data cleaning and record linkage, where errors could occur in queries as well as the data. Many existing algorithms have focused on in-memory indexes. In this paper we investigate how to efficiently answer such queries in a disk-based setting, by systematically studying the effects of storing data and indexes on disk. We devise a novel physical layout for an inverted index to answer queries and we study how to construct it with limited buffer space. To answer queries, we develop a cost-based, adaptive algorithm that balances the I/O costs of retrieving candidate matches and accessing inverted lists. Experiments on large, real datasets verify that simply adapting existing algorithms to a disk-based setting does not work well and that our new techniques answer queries efficiently. Further, our solutions...
Alexander Behm, Chen Li, Michael J. Carey
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Where ICDE
Authors Alexander Behm, Chen Li, Michael J. Carey
Comments (0)