On-chip coherence directories of today's multi-core systems are not energy efficient. Coherence directories dissipate a significant fraction of their power on unnecessary loo...
Pejman Lotfi-Kamran, Michael Ferdman, Daniel Crisa...
Symmetric multiprocessors (SMPs) connected with low-latency networks provide attractive building blocks for software distributed shared memory systems. Two distinct approaches hav...
While architects understandhow to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the re...
J. Gregory Steffan, Christopher B. Colohan, Antoni...
Abstract. Recent research indicates that prediction-based coherence optimizations offer substantial performance improvements for scientific applications in distributed shared memor...
Stephen Somogyi, Thomas F. Wenisch, Nikolaos Harda...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past de...