High-order stencil computations on multicore clusters

16 years 1 months ago

Download cacs.usc.edu

Stencil computation (SC) is of critical importance for broad scientific and engineering applications. However, it is a challenge to optimize complex, highorder SC on emerging clusters of multicore processors. We have developed a hierarchical SC parallelization framework that combines: (1) spatial decomposition based on message passing; (2) multithreading using critical section-free, dual representation; and (3) single-instruction multiple-data (SIMD) parallelism based on various code transformations. Our SIMD transformations include translocated statement fusion, vector composition via shuffle, and vectorized data layout reordering (e.g. matrix transpose), which are combined with traditional optimization techniques such as loop unrolling. We have thereby implemented two SCs of different characteristics—diagonally dominant, lattice Boltzmann method (LBM) for fluid flow simulation and highly off-diagonal (6-th order) finitedifference time-domain (FDTD) code for seismic wave propagatio...

Liu Peng, Richard Seymour, Ken-ichi Nomura, Rajiv

Real-time Traffic

Distributed And Parallel Computing | Hierarchical Sc Parallelization | IPPS 2009 | SIMD Transformations | Strong-scaling Simd Efficiency |

claim paper

Added	24 May 2010
Updated	24 May 2010
Type	Conference
Year	2009
Where	IPPS
Authors	Liu Peng, Richard Seymour, Ken-ichi Nomura, Rajiv K. Kalia, Aiichiro Nakano, Priya Vashishta, Alexander Loddoch, Michael Netzband, William R. Volz, Chap C. Wong

Sciweavers

High-order stencil computations on multicore clusters

Distributed And Parallel Computing | Hierarchical Sc Parallelization | IPPS 2009 | SIMD Transformations | Strong-scaling Simd Efficiency |

Explore & Download

Productivity Tools

Sciweavers