Path profiles record the frequencies of execution paths through a program. Until now, the best global instruction schedulers have relied upon profile-gathered frequencies of condi...
Three dimensional (3D) graphics applications have become very important workloads running on today's computer systems. A cost-effective graphics solution is to perform geomet...
The Multiscalar architecture advocates a distributed processor organization and task-level speculation to exploit high degrees of instruction level parallelism (ILP) in sequential...
This paper provides quantitative measurements of load latency tolerance in a dynamically scheduled processor. To determine the latency tolerance of each memory load operation, our...
This paper evaluates several mechanisms for repairing the return-address stack after branch mispredictions. The return-address stack is a small but important structure for achievi...
Kevin Skadron, Pritpal S. Ahuja, Margaret Martonos...
Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics are p...
Scott Rixner, William J. Dally, Ujval J. Kapasi, B...
Load latency remains a significant bottleneck in dynamically scheduled pipelined processors. Load speculation techniques have been proposed to reduce this latency. Dependence Pred...
The inherent instruction-level parallelism (ILP) of current applications (specially those based on floating point computations) has driven hardware designers and compilers writers...