Sciweavers

ACMMSP
2006
ACM

Implicit and explicit optimizations for stencil computations

14 years 7 months ago
Implicit and explicit optimizations for stencil computations
Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due primarily to the disparity between processor and main memory speeds. We examine several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor. The optimizations target cache reuse across stencil sweeps, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure. Finally, we consider stencil computations on a machine with an explicitlymanaged memory hierarchy, the Cell processor. Overall, results show that a cache-aware approach is significantly faster than a cache oblivious approach and that the explicitly managed memory on Cell is more efficient: Relative to the Power5, it has almost 2x more memory bandwidth and is 3.7x faster.
Shoaib Kamil, Kaushik Datta, Samuel Williams, Leon
Added 13 Jun 2010
Updated 13 Jun 2010
Type Conference
Year 2006
Where ACMMSP
Authors Shoaib Kamil, Kaushik Datta, Samuel Williams, Leonid Oliker, John Shalf, Katherine A. Yelick
Comments (0)