Interprocessor synchronization, while extremely important for ensuring execution correctness, can be very costly in terms of both power and performance overheads. Unfortunately, many parallelizing compilers are very conservative in inserting barrier synchronizations at the end of each and every parallel loop. This can lead to significant power consumption in chip multiprocessor based execution environments. This paper proposes a compiler-directed approach for eliminating such synchronization calls between neighboring parallel loops. It achieves its goal by partitioning loop iterations across processors such that each processor executes iterations from both the loops that access the same set of array elements. We implemented the proposed approach using an experimental compilation framework and made experiments with ten SPEC benchmark codes. Our experiments clearly show that the proposed compilerdirected approach is very effective and reduces energy overheads due to synchronizations by...
Mahmut T. Kandemir, Seung Woo Son