In order to achieve high performance, contemporary microprocessors must effectively process the four major instruction types: ALU, branch, load, and store instructions. This paper...
Bryan Black, Brian Mueller, Stephanie Postal, Ryan...
We introduce dependence-based pre-computation as a complement to history-based target prediction schemes. We present pre-computation in the context of virtual function calls (v-ca...
We revisit memory hierarchy design viewing memory as an inter-operation communication agent. This perspective leads to the development of novel methods of performing inter-operati...
Conventional snoopy-based chip multiprocessors take an aggressive approach broadcasting snoop requests to all nodes. In addition each node checks all received requests. This appro...
level of abstraction, compared with the program representation for scalar optimizations. For example, loop unrolling and loop unrolland-jam transformations exploit the large regist...
Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel M....