Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-c...
Numerical applications frequently contain nested loop structures that process large arrays of data. The execution of these loop structures often produces memory preference pattern...
Yoji Yamada, John Gyllenhall, Grant Haab, Wen-mei ...
We present a methodology for off-chip memory bandwidth minimization through application-driven L2 cache partitioning in multicore systems. A major challenge with multi-core system...
Cache misses form a major bottleneck for memory-intensive applications, due to the significant latency of main memory accesses. Loop tiling, in conjunction with other program tran...
The performance of most embedded systems is critically dependent on the memory hierarchy performance. In particular, higher cache hit rate can provide significant performance boos...