Abstract--This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand (IB) Host Cha...
Richard L. Graham, Stephen W. Poole, Pavel Shamis,...
Accurate, reproducible and comparable measurement of the overheads, communication times and progression behavior of blocking and nonblocking collective operations is a complicated...
This paper presents a case study about the applicability and usage of non blocking collective operations. These operations provide the ability to overlap communication with computa...
Torsten Hoefler, Peter Gottschling, Andrew Lumsdai...
Abstract-- The performance of collective communication operations is known to have a significant impact on the scalability of some applications. Indeed, the global, synchronous nat...
Ron Brightwell, Sue Goudy, Arun Rodrigues, Keith D...
We introduce LogGOPSim--a fast simulation framework for parallel algorithms at large-scale. LogGOPSim utilizes a slightly extended version of the well-known LogGPS model in combin...
All-to-all broadcast is one of the common collective operations that involve dense communication between all processes in a parallel program. Previously, programmable Network Inte...
PVM and other distributed computing systems have enabled the use of networks of workstations for parallel computation, but their approach of treating all networks as collections o...
This paper describes a first approach to implement MPI-2’s Extended Collective Operations. We aimed to ascertain the feasibility and effectiveness of such a project based on exis...
Collective operations on multiple distributed objects are a powerful means to coordinate parallel computations. In this paper we present an inheritance based approach to implement ...
We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a fo...
Sergei Gorlatch, Christoph Wedler, Christian Lenga...