In this paper we explore database segmentation in the context of a column-store DBMS targeted at a scientific database. We present a novel hardware- and scheme-oblivious segmentation algorithm, which learns and adapts to the workload immediately. The approach taken is to capitalize on (intermediate) query results, such that future queries benefit from a more appropriate data layout. The algorithm is implemented as an extension of a complete DBMS and evaluated against a real-life workload. It demonstrates significant performance gains without DBA assistance. Emerging column-store database systems [1], [2], [3] call for a revisit of the predominant segmentation techniques to cope with resource limitations, e.g., disk IO, network, memory, and CPU. As they deal with single columns only, their solution space is less complex and enables a better outlook for a selforganizing data segmentation scheme. In this paper we present the design and evaluation of an adaptive data segmentation technique...
Milena Ivanova, Martin L. Kersten, Niels Nes