In this paper we investigate a tunable MPI collective communications library on a cluster of SMPs. Most tunable collective communications libraries select optimal algorithms for inter-node communication on a given platform. We add another layer of intra-node communications composed by several tunable shared memory operations. We explore the advantages of our approach, and discuss when to use our approach, when to switch to another approach on the shared memory layer. Experimental results indicate that collective communications designed by such an approach with proper tuning can outperform vendor implementations. KEY WORDS Cluster Computing, Collective Communications, MPI, Shared Memory Operations, Tunable Libraries
Meng-Shiou Wu, Ricky A. Kendall, Srinivas Aluru