level of abstraction, compared with the program representation for scalar optimizations. For example, loop unrolling and loop unrolland-jam transformations exploit the large register file to eliminate redundant references to array elements and to expose more parallelism. Scalar replacement of memory references replaces array references with register references. Linear loop transformations, loop fusion, loop tiling, and loop distribution improve cache locality. Finally, data prefetching overlaps memory access latency with computation. A primary objective of scalar optimizations is to minimize the number of computations and the number of references to memory. The primary scalar optimization that achieves this objective is partial redundancy elimination (PRE),3 which minimizes the number of times an expression is evaluated. We extended PRE to use control and data speculation to eliminate more loads. PRE's counterpart, called partial dead-store elimination (PDSE), serves to remove red...
Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel M.