Abstract. We present new performance models and a new, more compact data structure for cache blocking when applied to the sparse matrixvector multiply (SpM×V) operation, y ← y +...
Rajesh Nishtala, Richard W. Vuduc, James Demmel, K...
With the advance of high-speed network technologies, the availability and popularity of streaming media content over the Internet has grown rapidly in recent years. The delivery a...
Addresses suffering from cache misses typically exhibit repetitive patterns due to the temporal locality inherent in the access stream. However, we observe that the number of inte...
While set-associative caches incur fewer misses than directmapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are p...
Caching techniques have been an efficient mechanism for mitigating the effects of the processor-memory speed gap. Traditional multi-level SRAM-based cache hierarchies, especially...