High-performance out-of-order processors spend a significant portion of their execution time on the incorrect program path even though they employ aggressive branch prediction al...
Onur Mutlu, Hyesoon Kim, David N. Armstrong, Yale ...
As the speed gap between CPU and memory widens, memory hierarchy has become the primary factor limiting program performance. Until now, the principal focus of hardware and softwar...
We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (spms), whose unit of transfer can be an individual instruction. Our algorithm ca...
This paper provides quantitative measurements of load latency tolerance in a dynamically scheduled processor. To determine the latency tolerance of each memory load operation, our...
Widely adopted distributor-based systems forward user requests to a balanced set of waiting servers in complete transparency to the users. The policy employed in forwarding reques...
Heung Ki Lee, Gopinath Vageesan, Ki Hwan Yum, Eun ...