In this paper we investigate a tunable MPI collective communications library on a cluster of SMPs. Most tunable collective communications libraries select optimal algorithms for i...
This paper describes how a portable benchmark suite that measures the ability of an MPI implementation to overlap computation and communication can be used to discover and diagnos...
Ron Brightwell, William Lawry, Arthur B. Maccabe, ...
This paper presents preliminary efforts to develop compilation and execution environments that achieve performance portability of multilevel parallelization on hierarchical archit...
Walden Ko, Mark N. Yankelevsky, Dimitrios S. Nikol...
—Understanding the communication behavior and network resource usage of parallel applications is critical to achieving high performance and scalability on systems with tens of th...
Ron Brightwell, Kevin T. Pedretti, Kurt B. Ferreir...
One of the most fundamental problems automatic parallelization tools are confronted with is to find an optimal domain decomposition for a given application. For regular domain prob...