Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either cannot address the case when fusion-preventing dependencies exist in nested loops, or cannot achieve good parallelism after fusion. This paper presents a significant addition to the current loop fusion techniques by presenting several efficient polynomial-time algorithms to solve these problems. These algorithms, based on multi-dimensional retiming, allow nested loop fusion even in the presence of outmost loop-carried dependencies or fusion-preventing dependencies. The multiple loops are modeled by a multi-dimensional loop dependence graph. The algorithms are applied to such a graph in order to perform the fusion and to obtain full parallelism in the innermost loop. Key Words: Loop Fusion, Loop Transformation, Nested Lo...
Edwin Hsing-Mean Sha, Chenhua Lang, Nelson L. Pass