We comparethe performance of software-supported shared memory on a general-purpose network to hardware-supported shared memory on a dedicated interconnect. Up to eight processors,...
Alan L. Cox, Sandhya Dwarkadas, Peter J. Keleher, ...
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for d...
Analysis of technology and application trends reveals a growing imbalance in the peak compute-to-memory-capacity ratio for future servers. At the same time, the fraction contribut...
Kevin T. Lim, Jichuan Chang, Trevor N. Mudge, Part...
We evaluate the e ect of processor speed, network bandwidth, and software overhead on the performance of release-consistent software distributed shared memory. We examine ve di er...
Sandhya Dwarkadas, Peter J. Keleher, Alan L. Cox, ...