This paper presents a hardware-based dynamic optimizer that continuously optimizes an application’s instruction stream. In continuous optimization, dataflow optimizations are p...
Brian Fahs, Todd M. Rafacz, Sanjay J. Patel, Steve...
The floating-point multiply-add fused (MAF) unit sets a new trend in the processor design to speed up floatingpoint performance in scientific and multimedia applications. This ...
Step caches are caches in which data entered to an cache array is kept valid only until the end of ongoing step of execution. Together with an advanced pipelined multithreaded arc...
This paper presents a 64-bit fixed-point vector multiply-accumulator (MAC) architecture capable of supporting multiple precisions. The vector MAC can perform one 64x64, two 32x32...
Current-generation microprocessors are designed to process instructions with one and two source operands at equal cost. Handling two source operands requires multiple ports for ea...