This paper proposes a new hardware technique for using one core of a CMP to prefetch data for a thread running on another core. Our approach simply executes a copy of all non-cont...
As processors continue to exploit more instruction level parallelism, a greater demand is placed on reducing the e ects of memory access latency. In this paper, we introduce a nov...
The Multiscalar architecture advocates a distributed processor organization and task-level speculation to exploit high degrees of instruction level parallelism (ILP) in sequential...
Allowing loads to issue out-of-order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Blindl...
The superblock is a scheduling region which exposes instruction level parallelism beyond the basic block through speculative execution of instructions. In general, scheduling supe...