We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stre...
Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the pro...
Global addressing of shared data simplifies parallel programming and complements message passing models commonly found in distributed memory machines. A number of programming sys...
Beng-Hong Lim, Chi-Chao Chang, Grzegorz Czajkowski...
Chip multiprocessors (CMPs) are expected to be the building blocks for future computer systems. While architecting these emerging CMPs is a challenging problem on its own, program...
Ozcan Ozturk, Mahmut T. Kandemir, Mary Jane Irwin,...