Cache misses in small, limited-associativity primary caches very often replace live cache blocks, given the dominance of capacity and conflict misses. Towards motivating novel cach...
Performance loss due to long-latency memory accesses can be reduced by servicing multiple memory accesses concurrently. The notion of generating and servicing long-latency cache m...
Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu,...
Software-controlled cache prefetching and data forwarding are widely used techniques for tolerating memory latency in shared memory multiprocessors. However, some previous studies...
Dynamic binary translators (DBT) have recently attracted much attention for embedded systems. The effective implementation of DBT in these systems is challenging due to tight cons...
Abstract--The 2-D Discrete Wavelet Transform (DWT) consumes up to 68% of the JPEG2000 encoding time. In this paper, we develop efficient implementations of this important kernel on...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati...