Iterative stencil loops (ISLs) are used in many applications and tiling is a well-known technique to localize their computation. When ISLs are tiled across a parallel architecture...
We assert that in order to perform well, a shared-memory multiprocessorinter-process communication (IPC)facility mustavoid a) accessing any shared data, and b) acquiring any locks...
– In this paper we present a new approach to benchmark the performance of shared memory systems. This approach focuses on recognizing how far off the performance of a given memor...
Memory system optimizations have been well studied on single-threaded systems; however, the wide use of simultaneous multithreading (SMT) techniques raises questions over their ef...
To execute a shared memory program efficiently, we have to manage memory consistency with low overheads, and have to utilize communication bandwidth of the platform as much as pos...