Centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption and are thus not suitable for consumer electronic devices. The conse...
To make efficient use of CMPs with tens to hundreds of cores, it is often necessary to exploit fine-grain parallelism. However, managing tasks of a few thousand instructions is ...
Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
In SMT processors, the complex interplay between private and shared datapath resources needs to be considered in order to realize the full performance potential. In this paper, we...
Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs. When there ...
We use Kleene algebra with tests to verify a wide assortment of common compiler optimizations, including dead code elimination, common subexpression elimination, copy propagation,...