B-trees are the data structure of choice for maintaining searchable data on disk. However, B-trees perform suboptimally ? when keys are long or of variable length, ? when keys are compressed, even when using front compression, the standard B-tree compression scheme, ? for range queries, and ? with respect to memory effects such as disk prefetching. This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: ? The COSB-tree searches asymptotically optimally and inserts and deletes nearly optimally. ? It maintains an index whose size is proportional to the frontcompressed size of the dictionary. Furthermore, unlike standard front-compressed strings, keys can be decompressed in a memory-efficient manner. ? It performs range queries with no extra disk seeks; in contrast, B-trees incur disk seeks when skipping from leaf block to leaf block. ? It utilizes all levels of a memory hierarchy efficiently and makes good use of disk locality ...
Michael A. Bender, Martin Farach-Colton, Bradley C