Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis

15 years 5 months ago

Download www.mcs.anl.gov

The emergence of multicore processors raises the need to efficiently transfer large amounts of data between local processes. MPICH2 is a highly portable MPI implementation whose large-message communication schemes suffer from high CPU utilization and cache pollution because of the use of a double-buffering strategy, common to many MPI implementations. We introduce two strategies offering a kernel-assisted, single-copy model with support for noncontiguous and asynchronous transfers. The first one uses the now widely available vmsplice Linux system call; the second one further improves performance thanks to a custom kernel module called KNEM. The latter also offers I/OAT copy offload, which is dynamically enabled depending on both hardware cache characteristics and message size. These new solutions outperform the standard transfer method in the MPICH2 implementation when no cache is shared between the processing cores or when very large messages are being transferred. Collective communi...

Darius Buntinas, Brice Goglin, David Goodell, Guil

Real-time Traffic

Cache | Distributed And Parallel Computing | ICPP 2009 | MPI Implementations | Portable Mpi Implementation |

claim paper

Post Info
More Details (n/a)

Added	19 Feb 2011
Updated	19 Feb 2011
Type	Journal
Year	2009
Where	ICPP
Authors	Darius Buntinas, Brice Goglin, David Goodell, Guillaume Mercier, Stephanie Moreaud

Comments (0)

Sciweavers

Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis

Cache | Distributed And Parallel Computing | ICPP 2009 | MPI Implementations | Portable Mpi Implementation |

Explore & Download

Productivity Tools

Sciweavers