Most shared memory systems maximize performance by unpredictably resolving memory races. Unpredictable memory races can lead to nondeterminism in parallel programs, which can suff...
Derek Hower, Polina Dudnik, Mark D. Hill, David A....
Communication in cache-coherent distributed shared memory (DSM) often requires invalidating (or writing back) cached copies of a memory block, incurring high overheads. This paper...
The explosive growth in the performance of microprocessors and networks has created a new opportunity to reduce the latency of fine-grain communication. Microprocessor clock speed...
The goal of this work is to explore architectural mechanisms for supporting explicit communication in cachecoherent shared memory multiprocessors. The motivation stems from the ob...
Heterogeneous multi-core processors, such as the IBM Cell processor, can deliver high performance. However, these processors are notoriously difficult to program: different cores...