We present a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimension...
The combination of increasing component power consumption, a desire for denser systems, and the required performance growth in the face of technology-scaling issues are posing eno...
Wesley M. Felter, Karthick Rajamani, Tom W. Keller...
In order for collective communication routines to achieve high performance on different platforms, they must be able to adapt to the system architecture and use different algori...
Performance simulation tools must be validated during the design process as functional models and early hardware are developed, so that designers can be sure of the performance of...
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI messagepassing programs for execution on distributed memory systems. This transl...
Given the importance of parallel mesh generation in large-scale scientific applications and the proliferation of multilevel SMTbased architectures, it is imperative to obtain ins...
Christos D. Antonopoulos, Xiaoning Ding, Andrey N....
We consider a cluster architecture in which dynamic content is generated by a database back-end and a collection of Web and application server front-ends. We study the effect of t...