The caching behavior of multimedia applications has been described as having high instruction reference locality within small loops, very large working sets, and poor data cache p...
This paper proposes a new way of efficiently doing arbitrary ¢ -bit permutations in programmable processors modeled on the theory of omega and flip networks. The new omflip ins...
Instruction fetch behavior has been shown to be very regular and predictable, even for diverse application areas. In this work, we propose the Lookahead Instruction Fetch Engine (...
Stephen Roderick Hines, Yuval Peress, Peter Gavin,...
The current trend is for processors to deliver dramatic improvements in parallel performance while only modestly improving serial performance. Parallel performance is harvested th...
Sanjeev Kumar, Daehyun Kim, Mikhail Smelyanskiy, Y...
The floating-point multiply-add fused (MAF) unit sets a new trend in the processor design to speed up floatingpoint performance in scientific and multimedia applications. This ...