This paper introduces a dynamic layout optimization strategy to minimize the number of cycles spent in memory accesses in a cache-based memory environment. In this approach, a giv...
N. E. Crosbie, Mahmut T. Kandemir, Ibrahim Kolcu, ...
In computer architecture, caches have primarily been viewed as a means to hide memory latency from the CPU. Cache policies have focused on anticipating the CPU’s data needs, and...
Jeffrey Stuecheli, Dimitris Kaseridis, David Daly,...
Many hardware optimizations rely on collecting information about program behavior at runtime. This information is stored in lookup tables. To be accurate and effective, these opti...
Ioana Burcea, Stephen Somogyi, Andreas Moshovos, B...
This paper presents COBRA (Continuous Binary ReAdaptation), a runtime binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based...
Software prefetching has been demonstrated as a powerful technique to tolerate long load latencies. However, to be effective, prefetching must target the most critical (frequently...