Petascale parallel computers with more than a million processing cores are expected to be available in a couple of years. Although MPI is the dominant programming interface today ...
Pavan Balaji, Darius Buntinas, David Goodell, Will...
With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across ...
As high-end computing systems continue to grow in scale, recent advances in multiand many-core architectures have pushed such growth toward more denser architectures, that is, mor...
Pavan Balaji, Darius Buntinas, David Goodell, Will...
The MPI-2 Standard has carefully specified the interaction between MPI and usercreated threads. The goal of this specification is to allow users to write multithreaded MPI progr...
Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to suppor...
Abstract. SKaMPI is a benchmark for MPI implementations. Its purpose is the detailed analysis of the runtime of individual MPI operations and comparison of these for di erent imple...
Ralf Reussner, Peter Sanders, Lutz Prechelt, Matth...
This paper describes how a portable benchmark suite that measures the ability of an MPI implementation to overlap computation and communication can be used to discover and diagnos...
Ron Brightwell, William Lawry, Arthur B. Maccabe, ...
The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. Thi...
Surendra Byna, William D. Gropp, Xian-He Sun, Raje...
This paper presents an MPI implementation that allows an easy and efficient use of the interconnection of several clusters, of potentially heterogeneous nature (as far as the net...
Abstract. The ability to offload functionality to a programmable network interface is appealing, both for increasing message passing performance and for reducing the overhead on t...