The interconnect mechanisms (shared bus or crossbar) used in current chip-multiprocessors (CMPs) are expected to become a bottleneck that prevents these architectures from scaling...
Realizing scalable cache coherence in the many-core era comes with a whole new set of constraints and opportunities. It is widely believed that multi-hop, unordered on-chip networ...
Recently-proposed architectures that continuously operate on atomic blocks of instructions (also called chunks) can boost the programmability and performance of shared-memory mult...
Address re-mapping techniques in so-called active memory systems have been shown to dramatically increase the performance of applications with poor cache and/or communication beha...
Dhiraj D. Kalamkar, Mainak Chaudhuri, Mark Heinric...
Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent. Some memory references in such systems, however, suffer long latencies for ...