High Throughput Intra-Node MPI Communication with Open-MX

15 years 8 months ago

Download hal.inria.fr

Abstract—The increasing number of cores per node in highperformance computing requires an efﬁcient intra-node MPI communication subsystem. Most existing MPI implementations rely on two copies across a shared memory-mapped ﬁle. Open-MX offers a single-copy mechanism that is tightly integrated in its regular communication stack, making it transparently available to the MX backend of many MPI layers. We describe this implementation and its ofﬂoaded copy backend using I/OAT hardware. Memory pinning requirements are then discussed, and overlapped pinning is introduced to enable the start of Open-MX intra-node data transfer earlier. Performance evaluation shows that this local communication stack performs better than MPICH2 and Open MPI for large messages, reaching up to 70 % better throughput in microbenchmarks when using I/OAT copy ofﬂoad. Thanks to a singlecopy being involved, the Open-MX intra-node communication throughput also does not heavily depend on cache sharing between p...

Brice Goglin

Real-time Traffic