Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics are p...
Scott Rixner, William J. Dally, Ujval J. Kapasi, B...
Load latency remains a significant bottleneck in dynamically scheduled pipelined processors. Load speculation techniques have been proposed to reduce this latency. Dependence Pred...
The inherent instruction-level parallelism (ILP) of current applications (specially those based on floating point computations) has driven hardware designers and compilers writers...
Clustering is one solution to the demand for wideissue machines and fast clock cycles because it allows for smaller, less ported register files and simpler bypass logic while rema...
This paper proposes a precise approach to register allocation for irregular-register architectures which is based on 0-1 integer programming (IP). Prior work shows that IP registe...
To achieve highly accurate branch prediction, it is necessary not only to allocate more resources to branch prediction hardware but also to improve the understanding of branch exe...
The fill unit is the structure which collects blocks of instructions and combines them into multi-block segments for storage in a trace cache. In this paper, we expand the role of...