Address re-mapping techniques in so-called active memory systems have been shown to dramatically increase the performance of applications with poor cache and/or communication beha...
Dhiraj D. Kalamkar, Mainak Chaudhuri, Mark Heinric...
Much of high performance technical computing has moved from shared memory architectures to message based cluster systems. The development and wide adoption of the MPI parallel pro...
The use of multi-core, multi-processor machines is opening new opportunities for software speculation, where program code is speculatively executed to improve performance at the a...
Shared memory is an appealing abstraction for parallel programming. It must be implemented with caches in order toperform well, however, and caches require a coherence mechanism t...
This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-cont...