Abstract. Data declustering speeds up large data set retrieval by partitioning the data across multiple disks or sites and performing retrievals in parallel. Performance is determined by how the data is broken into "buckets" and how the buckets are assigned to disks. While some work has been done for declustering uniformly distributed low dimensional data, little work has been done on declustering non-uniform high dimensional data. To decluster non-uniform data, a distribution sensitive bucketing algorithm is crucial for achieving good performance. In this paper we propose a simple and efficient data distribution sensitive bucketing algorithm. Our method employs a method based on shifted Hilbert curves to adapt to the underlying data distribution. Our proposed declustering algorithm gives good performance compared with previous work which have mostly focused on bucket-to-disk allocation scheme. Our experimental results show that the proposed declustering algorithm achieves a ...
Hak-Cheol Kim, Mario A. Lopez, Scott T. Leutenegge