This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors. Data locality and minimization of the rate of r...
Dimitrios S. Nikolopoulos, Theodore S. Papatheodor...
The developers of high-performance scientific applications often work in complex computing environments that place heavy demands on program analysis tools. The developers need to...
Kathleen A. Lindlan, Janice E. Cuny, Allen D. Malo...
This paper describes a Wrapper Generator for wrapping high performance legacy codes as Java/CORBA components for use in a distributed component-based problemsolving environment. U...
Maozhen Li, Omer F. Rana, Matthew S. Shields, Davi...
This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. The Space Shuttle Main Engine (SSME) turbo-pump impeller is...
We use the term “Grid” to refer to a software system that provides uniform and location independent access to geographically and organizationally dispersed, heterogeneous reso...
William E. Johnston, Dennis Gannon, Bill Nitzberg,...
We describe a novel distributed graphics system that allows an application to render to a large tiled display. Our system, called WireGL, uses a cluster of off-the-shelf PCs conne...
Greg Humphreys, Ian Buck, Matthew Eldridge, Pat Ha...
We demonstrate that data reordering can substantially improve the performance of fine-grained irregular sharedmemory benchmarks, on both hardware and software shared-memory syste...
This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and p...
William Gropp, Dinesh K. Kaushik, David E. Keyes, ...
The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP cluster...