This paper presents the implementation of MPICH2 over the Nemesis communication subsystem and the evaluation of its shared-memory performance. We describe design issues as well as...
This paper presents a case study about the applicability and usage of non blocking collective operations. These operations provide the ability to overlap communication with computa...
Torsten Hoefler, Peter Gottschling, Andrew Lumsdai...
Abstract. In this paper, we focus on MPI collective algorithm selection process and explore the applicability of the quadtree encoding method to this problem. During the algorithm ...
Jelena Pjesivac-Grbovic, George Bosilca, Graham E....
The Sony–Toshiba–IBM Cell Broadband Engine (Cell/B.E.) is a heterogeneous multicore architecture that consists of a traditional microprocessor (PPE) with eight SIMD co-process...
David A. Bader, Virat Agarwal, Kamesh Madduri, Seu...
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterogeneous multi-core processors. Heterogeneous multi-core processors integrate con...
Filip Blagojevic, Dimitrios S. Nikolopoulos, Alexa...
The MPI-2 Standard has carefully specified the interaction between MPI and usercreated threads. The goal of this specification is to allow users to write multithreaded MPI progr...
In this paper, we study the problem of optimal matrix partitioning for parallel dense factorization on heterogeneous processors. First, we outline existing algorithms solving the ...