High performance computers currently under construction, such as IBM’s Blue Gene/L, consisting of large numbers (64K) of low cost processing elements with relatively small local memories (256MB) connected via relatively low bandwidth (0.0625 Bytes/FLOP) low cost interconnection networks promise exceptional cost-performance for some scientific applications. Due to the large number of processing elements and adaptive routing networks in such systems, performance analysis of meaningful application kernels requires innovative methods. This paper describes a method that combines application analysis, tracing and parallel discrete event simulation to provide early performance prediction. Specifically, results of performance analysis of a Lennard-Jones Spatial (LJS) Decomposition molecular dynamics benchmark code for Blue Gene/L are given.
Ed Upchurch, Paul L. Springer, Maciej Brodowicz, S