Sciweavers

2932 search results - page 81 / 587
» Optimizing Memory System Performance for Communication in Pa...
Sort
View
CLUSTER
2004
IEEE
15 years 6 months ago
On optimizing collective communication
In this paper we discuss issues related to the highperformance implementation of collective communications operations on distributed-memory computer architectures. Using a combina...
E. W. Chan, M. F. Heimlich, Avi Purkayastha, Rober...
106
Voted
IPPS
2006
IEEE
15 years 8 months ago
Performance evaluation of supercomputers using HPCC and IMB benchmarks
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnec...
Subhash Saini, Robert Ciotti, Brian T. N. Gunney, ...
PLDI
1995
ACM
15 years 6 months ago
Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation
Partial Redundancy Elimination PRE is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redun...
Gagan Agrawal, Joel H. Saltz, Raja Das
125
Voted
HIPC
2009
Springer
15 years 8 days ago
Optimizing the use of GPU memory in applications with large data sets
Abstract--With General Purpose programmable GPUs becoming more and more popular, automated tools are needed to bridge the gap between achievable performance from highly parallel ar...
Nadathur Satish, Narayanan Sundaram, Kurt Keutzer
123
Voted
IPPS
2009
IEEE
15 years 9 months ago
Scalability challenges for massively parallel AMR applications
PDE solvers using Adaptive Mesh Refinement on block structured grids are some of the most challenging applications to adapt to massively parallel computing environments. We descr...
Brian van Straalen, John Shalf, Terry J. Ligocki, ...