In order to achieve high performance, contemporary microprocessors must effectively process the four major instruction types: ALU, branch, load, and store instructions. This paper...
Bryan Black, Brian Mueller, Stephanie Postal, Ryan...
Memory interleaving is a cost-efficient approach to increase bandwidth. Improving data access locality and reducing memory access conflicts are two important aspects to achieve hi...
We introduce virtually-pipelined memory, an architectural technique that efficiently supports high-bandwidth, uniform latency memory accesses, and high-confidence throughput eve...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past de...
Abstract— In this paper, we describe the design and implementation of the primary memory system of the TRIPS processor. To match the aggressive execution bandwidth and support hi...
Simha Sethumadhavan, Robert G. McDonald, Rajagopal...