Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirements. We present a combined approach that schedules the loop operations for the highest steady state throughput and minimum register requirements. Our method determines optimal register requirements for machines with nite resources and for general dependence graphs. We compare the performance of this and other modulo schedulers for a benchmark of 629 loops from the Perfect Club, SPEC-89, and the Livermore Fortran Kernels. Measurements demonstrate the potential of register-sensitive modulo schedulers, which will be useful in evaluating the performance of register-sensitive modulo scheduling heuristics.
Alexandre E. Eichenberger, Edward S. Davidson, San