In modern computer systems loops present a great deal of opportunities for increasing Instruction Level and Thread Level Parallelism. Loop unrolling is a technique used to obtain greater ILP while independent loop iterations are assigned to different threads to obtain greater TLP. However, techniques are needed to avoid unnecessary checks to assure that only the correct number of iterations are executed. In this paper we evaluate simple loop transformation techniques that can improve the performance by eliminating some unnecessary conditional instructions checking for iteration bounds. We present information on the number of instructions eliminated as well as on the improved branch prediction rates and execution performance improvements. Our techniques are applicable to most modern architecture including superscalar, multithreaded, VLIW or EPIC systems. Key words. ILP, TLP, Loop Level Parallelism, Branch Prediction, Code Transformation.
Litong Song, Yuhua Zhang, Krishna M. Kavi