To take advantage of recent architectural improvements in microprocessors, advanced compiler optimizations such as software pipelining have been developed 1, 2, 3, 4]. Unfortunately, not all loops have enough parallelism in the innermost loop body to take advantage of all of the resources a machine provides. Unroll-and-jam is a transformation that can be used to increase the amount of parallelism in the innermost loop body by making better use of resources and limiting the e ects of recurrences 5, 6]. In this paper, we demonstrate how unroll-and-jam can signi cantly improve the initiation interval in a software-pipelined loop. Improvements in the initiation interval of greater than 40% are common, while dramatic improvements of a factor of 5 are possible.
Steve Carr, Chen Ding, Philip H. Sweany