In this paper we propose a design technique to pipeline cache memories for high bandwidth applications. With the scaling of technology cache access latencies are multiple clock cy...
A new set of boundary-handling algorithms has been developed for discrete wavelet transforms in the ISO/IEC JPEG-2000 Still Image Coding Standard. Two polyphase component extrapol...
Christopher M. Brislawn, Brendt Wohlberg, Allon G....
Hard-to-predict branches depending on longlatency cache-misses have been recognized as a major performance obstacle for modern microprocessors. With the widening speed gap between...
Hongliang Gao, Yi Ma, Martin Dimitrov, Huiyang Zho...
— Modern CPUs operate at GHz frequencies, but the latencies of memory accesses are still relatively large, in the order of hundreds of cycles. Deeper cache hierarchies with large...
Konrad Malkowski, Greg M. Link, Padma Raghavan, Ma...
With faster CPU clocks and wider pipelines, all relevant microarchitecture components should scale accordingly. There have been many proposals for scaling the issue queue, registe...