We present a set of advanced program parallelization techniques that are able to signi cantly improve the performance of application programs. We present evidence for this improvement in terms of the overall program speedup that we have achieved on the Perfect BenchmarksR programs, and in terms of the performance gains that can be attributed to individual techniques. These numbers were measured on the Cedar multiprocessor at the University of Illinois. This paper extends the ndings previously reported in EHLP91]. The techniques credited most for the performance gains include array privatization, parallelization of reduction operations, and the substitution of generalized induction variables. We have applied these transformations by hand to the given programs, in a mechanical manner, similar to that of parallelizing compiler. Because of our success with these transformations, we believe that it will be possible to implement many of these techniques in a new generation of parallelizing ...
Rudolf Eigenmann, Jay Hoeflinger, David A. Padua