One can e ectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential bene ts of predicated execution are hig...
Scott A. Mahlke, Richard E. Hank, James E. McCormi...
Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor ...
The PowerPC 620TM microprocessor1 is the most recent and performance leading member of the PowerPCTM family. The 64-bit PowerPC 620 microprocessor employs a two-phase branch predi...
Accurate instruction fetch and branch prediction is increasingly important on today’s wide-issue architectures. Fetch prediction is the process of determining the next instructi...
Recent superscalar processors issue four instructions per cycle. These processors are also powered by highly-parallel superscalar cores. The potential performance can only be expl...
Thomas M. Conte, Kishore N. Menezes, Patrick M. Mi...
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...
Speculative execution is execution of instructions before it is known whether these instructions should be executed. Compiler-based speculative execution has the potential to achi...
Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective...
Jeffrey Dean, James E. Hicks, Carl A. Waldspurger,...