We introduce virtually-pipelined memory, an architectural technique that efficiently supports high-bandwidth, uniform latency memory accesses, and high-confidence throughput eve...
This paper introduces a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. ...
Rodric M. Rabbah, Hariharan Sandanagobalane, Mongk...
For many applications, branch mispredictions and cache misses limit a processor’s performance to a level well below its peak instruction throughput. A small fraction of static i...
This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory laten...
The RAMpage hierarchy moves main memory up a level to replace the lowest-level cache by an equivalent-sized SRAM main memory. This paper is a first look at the value of RAMpage to ...