Clustering is one solution to the demand for wideissue machines and fast clock cycles because it allows for smaller, less ported register files and simpler bypass logic while rema...
This paper proposes a precise approach to register allocation for irregular-register architectures which is based on 0-1 integer programming (IP). Prior work shows that IP registe...
To achieve highly accurate branch prediction, it is necessary not only to allocate more resources to branch prediction hardware but also to improve the understanding of branch exe...
The fill unit is the structure which collects blocks of instructions and combines them into multi-block segments for storage in a trace cache. In this paper, we expand the role of...
We present an architecture that features dynamic multithreading execution of a single program. Threads are created automatically by hardware at procedure and loop boundaries and e...