Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization

16 years 2 days ago

Download ft.ornl.gov

Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hardware resources and its high memory bandwidth. A Cray X1 node is composed of four MSPs, which in turn are composed of four single streaming processors (SSP). Each SSP contains a superscalar processing unit and two vector processing units. Compiler vectorization provides loop level parallelization and uses the vector processing hardware. Multistreaming code generation by the compiler permits execution across the SSPs of an MSP on a block of code. In this paper, we analyze overall impact of loop-level compiler optimization on a scientiﬁc application called Parallel Ocean Program (POP). POP has been extensively optimized for X1 by instrumenting the code using X1 compiler directives. We compare and contrast automatic and manual optimization schemes available on X1 and analyze their impact on the code performance...

Sadaf R. Alam, Jeffrey S. Vetter

Real-time Traffic

Compiler | Cray X1 | Cray X1 Node | ICCS 2005 |

claim paper

Post Info
More Details (n/a)

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	ICCS
Authors	Sadaf R. Alam, Jeffrey S. Vetter

Comments (0)

Sciweavers

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization

Compiler | Cray X1 | Cray X1 Node | ICCS 2005 |

Explore & Download

Productivity Tools

Sciweavers