In this paper, we discuss a library generator for parallel sorting routines that examines the input characteristics (and the parameters they affect) to select the best performing algorithm. Our preliminary experimental results show that the automatic generation of a distributed memory parallel sorting routine provides up to a four fold improvement over standard parallel algorithms with typical parameters. With the recent importance of multicore processors, we are extending this work to shared memory. This provides new challenges specific to multicore systems. However, with their increasing popularity, this extension becomes very valuable.
Brian A. Garber, Daniel Hoeflinger, Xiaoming Li, M