We develop an algorithm for parallel disk sorting, whose I/O cost approaches the lower bound and that guarantees almost perfect overlap between I/O and computation. Previous algorithms have either suboptimal I/O volume or cannot guarantee that I/O and computations can always be overlapped. We give an efficient implementation that can (at least) compete with the best practical implementations but gives additional performance guarantees. For the experiments we have configured a state of the art machine that can sustain full bandwidth I/O with eight disks and is very cost effective. Categories and Subject Descriptors D.4.2 [Storage Management]: secondary storage; E.5 [Files]: sorting/searching; F.2.2 [Nonnumerical Algorithms and Problems]: sorting and searching General Terms algorithms, performance, theory Keywords algorithm engineering, algorithm library, external memory sorting, large data sets, overlapping I/O and computation, parallel disks, prefetching, randomized algorithm, secon...