With the trend towards increasing number of processor cores in future chip architectures, scalable directory-based protocols for maintaining cache coherence will be needed. Howeve...
This paper presents and studies a distributed L2 cache management approach through OS-level page allocation for future many-core processors. L2 cache management is a crucial multi...
Instruction aggregation—the grouping of multiple operations into a single processing unit—is a technique that has recently been used to amplify the bandwidth and capacity of c...
The large working sets of commercial and scientific workloads stress the L2 caches of Chip Multiprocessors (CMPs). Some CMPs use a shared L2 cache to maximize the on-chip cache c...
Bradford M. Beckmann, Michael R. Marty, David A. W...
We introduce virtually-pipelined memory, an architectural technique that efficiently supports high-bandwidth, uniform latency memory accesses, and high-confidence throughput eve...
3D die stacking is an exciting new technology that increases transistor density by vertically integrating two or more die with a dense, high-speed interface. The result of 3D die ...
Bryan Black, Murali Annavaram, Ned Brekelbaum, Joh...