Sciweavers

ICPPW
2008
IEEE

Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems

14 years 7 months ago
Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems
Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system. A major challenge to be addressed is efficient use of such cluster systems for large-scale scientific applications. In this paper, we quantify the performance gap resulting from using different number of processors per node; this information is used to provide a baseline for the amount of optimization needed when using all processors per node on CMP clusters. We conduct detailed performance analysis to identify how applications can be modified to efficiently utilize all processors per node on CMP clusters, especially focusing on two scientific applications: a 3D particle-in-cell, magnetic fusion application Gyrokinetic Toroidal Code (GTC) and a Lattice Boltzmann Method for simulating fluid dynamics (LBM). In terms of refinements, we use conventional techniques such as cache blocking, loop unrolling and loop ...
Xingfu Wu, Valerie E. Taylor, Charles W. Lively, S
Added 30 May 2010
Updated 30 May 2010
Type Conference
Year 2008
Where ICPPW
Authors Xingfu Wu, Valerie E. Taylor, Charles W. Lively, Sameh Sharkawi
Comments (0)