We introduce a new deterministic parallel sorting algorithm based on the regular sampling approach. The algorithm uses only two rounds of regular all-to-all personalized communication in a scheme that yields very good load balancing with virtually no overhead. Moreover, unlike previous variations, our algorithm e ciently handles the presence of duplicate values without the overhead of tagging each element with a unique identi er. This algorithm was implemented in Split-C and run on a variety of platforms, including the Thinking Machines CM-5, the IBM SP-2-WN, and the Cray Research T3D. We ran our code using widely di erent benchmarks to examine the dependence of our algorithm on the input distribution. Our experimental results illustrate the e ciency and scalability of our algorithm across di erent platforms. In fact, the performance compares closely to that of our random sample sort algorithm, which seems to outperform all similar algorithms known to the authors on these platforms. T...
David R. Helman, Joseph JáJá, David