The one-sided communication operations in MPI are intended to provide the convenience of directly accessing remote memory and the potential for higher performance than regular point-to-point communication. Our performance measurements with three MPI implementations (IBM MPI, Sun MPI, and LAM) indicate, however, that one-sided communication can perform much worse than point-to-point communication if the associated synchronization calls are not implemented efficiently. In this paper, we describe our efforts to minimize the overhead of synchronization in our implementation of one-sided communication in MPICH-2. We describe our optimizations for all three synchronization mechanisms defined in MPI: fence, post-start-complete-wait, and lock-unlock. Our performance results demonstrate that, for short messages, MPICH-2 performs six times faster than LAM for fence synchronization and 50% faster for post-start-complete-wait synchronization, and it performs more than twice as fast as Sun MPI fo...
Rajeev Thakur, William D. Gropp, Brian R. Toonen