This paper deals with general nested loops and proposes a novel scheduling methodology for reducing the communication cost of parallel programs. General loops contain complex loop bodies (consisting of arbitrary program statements, such as assignments, conditions and repetitions) that exhibit uniform loop-carried dependencies. Therefore it is now possible to achieve efficient parallelization for a vast class of loops, mostly found in DSP, PDEs, signal and video coding. We use computational geometry methods, that exploit efficiently the regularity of nested loops index spaces, in order to significantly reduce the communication cost, which in most cases is the main drawback of parallel programs’ performance. Through extensive testing, we show that the proposed method outperforms in all cases the classic cyclic mapping, succeeding to reduce the communication by 15%35%. This significant reduction of the communication volume makes our method a promising candidate to be incorporated i...
Florina M. Ciorba, Theodore Andronikos, Ioannis Dr