Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
Because they are based on large content-addressable memories, load-store queues (LSQ) present implementation challenges in superscalar processors, especially as issue width and nu...
Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be incr...
Buffering more in-flight instructions in an out-of-order microprocessor is a straightforward and effective method to help tolerate the long latencies generally associated with ...
Alok Garg, Fernando Castro, Michael C. Huang, Dani...