In a Chip Multi-Processor (CMP) with private caches, the last level cache is statically partitioned between all the cores. This prevents such CMPs from sharing cache capacity in r...
Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a writeinvalidate protocol result in invalidation actions that could be elimin...
Scalability of applications on distributed sharedmemory (DSM) multiprocessors is limited by communication overheads. At some point, using more processors to increase parallelism y...
Khaled Z. Ibrahim, Gregory T. Byrd, Eric Rotenberg
Current on-chip block-centric memory hierarchies exploit access patterns at the fine-grain scale of small blocks. Several recently proposed techniques for coherence traffic reduct...
The large working sets of commercial and scientific workloads stress the L2 caches of Chip Multiprocessors (CMPs). Some CMPs use a shared L2 cache to maximize the on-chip cache c...
Bradford M. Beckmann, Michael R. Marty, David A. W...