Much research has been devoted to making microprocessors energy-efficient. However, little attention has been paid to multiprocessor environments where, due to the co-operative na...
We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vec...
A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in...
Anja Feldmann, Thomas R. Gross, David R. O'Hallaro...