Sciweavers

2932 search results - page 81 / 587
» Optimizing Memory System Performance for Communication in Pa...
Sort
View
CLUSTER
2004
IEEE
13 years 11 months ago
On optimizing collective communication
In this paper we discuss issues related to the highperformance implementation of collective communications operations on distributed-memory computer architectures. Using a combina...
E. W. Chan, M. F. Heimlich, Avi Purkayastha, Rober...
IPPS
2006
IEEE
14 years 2 months ago
Performance evaluation of supercomputers using HPCC and IMB benchmarks
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnec...
Subhash Saini, Robert Ciotti, Brian T. N. Gunney, ...
PLDI
1995
ACM
13 years 11 months ago
Interprocedural Partial Redundancy Elimination and its Application to Distributed Memory Compilation
Partial Redundancy Elimination PRE is a general scheme for suppressing partial redundancies which encompasses traditional optimizations like loop invariant code motion and redun...
Gagan Agrawal, Joel H. Saltz, Raja Das
HIPC
2009
Springer
13 years 5 months ago
Optimizing the use of GPU memory in applications with large data sets
Abstract--With General Purpose programmable GPUs becoming more and more popular, automated tools are needed to bridge the gap between achievable performance from highly parallel ar...
Nadathur Satish, Narayanan Sundaram, Kurt Keutzer
IPPS
2009
IEEE
14 years 2 months ago
Scalability challenges for massively parallel AMR applications
PDE solvers using Adaptive Mesh Refinement on block structured grids are some of the most challenging applications to adapt to massively parallel computing environments. We descr...
Brian van Straalen, John Shalf, Terry J. Ligocki, ...