The performance of most embedded systems is critically dependent on the average memory access latency. Improving the cache hit rate can have significant positive impact on the per...
Dynamic voltage and frequency scaling is increasingly being used to reduce the energy requirements of embedded and real-time applications by exploiting idle CPU resources, while s...
Christian Poellabauer, Leo Singleton, Karsten Schw...
We present a methodology for off-chip memory bandwidth minimization through application-driven L2 cache partitioning in multicore systems. A major challenge with multi-core system...
While caches have become invaluable for higher-end architectures due to their ability to hide, in part, the gap between processor speed and memory access times, caches (and partic...
Sparse matrix-vector multiplication is an important kernel that often runs inefficiently on superscalar RISC processors. This paper describes techniques that increase instruction-...