Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often di cult. In this paper we develop randomized variants of distribution sort for use with parallel independent disks. We propose a simple variant called randomized cycling distribution sort (RCD) and prove that it has optimal expected I/O complexity. The analysis uses a novel reduction to a model with signi cantly fewer probabilistic interdependencies. Experimental evidence is provided to support its practicality. Other simple variants are also examined experimentally and appear to o er similar advantages to RCD. Based upon ideas in RCD we propose general techniques that transparently simulate algorithms developed for the unrealistic multihead disk model so that they can be run on the realistic parallel disk model. The simulation is optimal for two importantclasses of algorithms: the class of multipass algorithms, which make a complete pass through their data befo...
Jeffrey Scott Vitter, David A. Hutchinson