Exploiting parallelism at both the multiprocessor level and the instruction level is an e ective means for supercomputers to achieve high-performance. The amount of instruction-level parallelism available to superscalar or VLIW node processors can be limited, however, with conventional compiler optimization techniques. In this paper, a set of compiler transformations designed to increase instruction-level parallelism is described. The e ectiveness of these transformations is evaluated using 40 loop nests extracted from a range of supercomputer applications. This evaluation shows that increasing execution resources in superscalar VLIW node processors yields little performance improvement unless loop unrolling and register renaming are applied. It also reveals that these two transformations are su cient for DOALL loops. However, more advanced transformations are required in order for serial and DOACROSS loops to fully bene t from the increased execution resources. The results show that ...
Scott A. Mahlke, William Y. Chen, John C. Gyllenha