Increased complexity of memory systems to ameliorate the gap between the speed of processors and memory has made it increasingly harder for compilers to optimize an arbitrary code...
Parallel programming models should attempt to satisfy two conflicting goals. On one hand, they should hide architectural details so that algorithm designers can write simple, port...
Brian Grayson, Michael Dahlin, Vijaya Ramachandran
Current microprocessors aggressively exploit instructionlevel parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work ...
Parthasarathy Ranganathan, Vijay S. Pai, Hazim Abd...
We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vec...
Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the end performance of applications that...