Modern high-performance architectures require extremely accurate branch prediction to overcome the performance limitations of conditional branches. We present a framework that cat...
Previous research has shown that the SPEC benchmarks achieve low miss ratios in relatively small instruction caches. This paper presents evidence that current software-development...
Richard Uhlig, David Nagle, Trevor N. Mudge, Stuar...
Communicationin aparallel systemfrequently involvesmoving data from the memory of one node to the memory of another; this is the standard communication model employedin message pa...
Current trends in processor design are pointing to deeper and wider pipelines and superscalar architectures. The efficient use of these resources requires speculative execution, ...
Dennis Lee, Jean-Loup Baer, Brad Calder, Dirk Grun...
This paper introduces dynamic self-invalidation (DSI), a new technique for reducing cache coherence overhead in shared-memory multiprocessors. DSI eliminates invalidation messages...
One can e ectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential bene ts of predicated execution are hig...
Scott A. Mahlke, Richard E. Hank, James E. McCormi...
Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor ...
The PowerPC 620TM microprocessor1 is the most recent and performance leading member of the PowerPCTM family. The 64-bit PowerPC 620 microprocessor employs a two-phase branch predi...
Accurate instruction fetch and branch prediction is increasingly important on today’s wide-issue architectures. Fetch prediction is the process of determining the next instructi...
Recent superscalar processors issue four instructions per cycle. These processors are also powered by highly-parallel superscalar cores. The potential performance can only be expl...
Thomas M. Conte, Kishore N. Menezes, Patrick M. Mi...