Distributing data is a fundamental problem in implementing efficient distributed-memory parallel programs. The problem becomes more difficult in environments where the participati...
D. Brent Weatherly, David K. Lowenthal, Mario Naka...
Traditional collective communication algorithms are designed with the assumption that a node can communicate with only one other node at a time. On new parallel architectures such...
Ernie Chan, Robert A. van de Geijn, William Gropp,...
In order for collective communication routines to achieve high performance on different platforms, they must be able to adapt to the system architecture and use different algori...
Abstract. This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of m...
The paper proposes a novel approach for optimizing performance of all-to-all collective communication by taking advantage of concurrency available in modern networks such as Infin...