MPI Alltoall is one of the most communication intense collective operation used in many parallel applications. Recently, the supercomputing arena has witnessed phenomenal growth of commodity clusters built using InfiniBand and multi-core systems. In this context, it is important to optimize this operation for these emerging clusters to allow for good application scaling. However, optimizing MPI Alltoall on these emerging systems is not a trivial task. InfiniBand architecture allows for varying implementations of the network protocol stack. For example, the protocol can be totally on-loaded to a host processing core or it can be off-loaded onto the NIC or can use any combination of the two. Understanding the characteristics of these different implementations is critical in optimizing a communication intense operation such as MPI Alltoall. In this paper, we systematically study these different architectures and propose new schemes for MPI Alltoall tailored to these architectures. Spec...
Rahul Kumar, Amith R. Mamidala, Dhabaleswar K. Pan