We present a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model combines the separate contributions of computation and communication wavefronts. We validate the model on three important supercomputer systems, on up to 500 processors. We use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. We also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. On such machines our analysis shows that, contrary to conventional wisdom, inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor.
Adolfy Hoisie, Olaf M. Lubeck, Harvey J. Wasserman