There is a continuous research effort devoted to overcome the memory wall problem. Prefetching is one of the most frequently used techniques. A prefetch mechanism anticipates the ...
The trend in microprocessor design toward multicore and manycore processors means that future performance gains in software will largely come from harnessing parallelism. To reali...
This paper describes FAST, a novel simulation methodology that can produce simulators that (i) are orders of magnitude faster than comparable simulators, (ii) are cycleaccurate, (...
Derek Chiou, Dam Sunwoo, Joonsoo Kim, Nikhil A. Pa...
In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution o...
Since there is generally insufficient instruction level parallelism within a single basic block, higher performance is achieved by speculatively scheduling operations in superbloc...