In this paper we discuss issues related to the highperformance implementation of collective communications operations on distributed-memory computer architectures. Using a combina...
E. W. Chan, M. F. Heimlich, Avi Purkayastha, Rober...
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnec...
Subhash Saini, Robert Ciotti, Brian T. N. Gunney, ...
Partial Redundancy Elimination PRE is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redun...
Abstract--With General Purpose programmable GPUs becoming more and more popular, automated tools are needed to bridge the gap between achievable performance from highly parallel ar...
PDE solvers using Adaptive Mesh Refinement on block structured grids are some of the most challenging applications to adapt to massively parallel computing environments. We descr...
Brian van Straalen, John Shalf, Terry J. Ligocki, ...