In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
Both commercial and scientific workloads benefit from concurrency and exhibit data sharing across threads/processes. The resulting sharing patterns are often fine-grain, with t...
Hemayet Hossain, Sandhya Dwarkadas, Michael C. Hua...
The memory system's large and growing contribution to system performance motivates more aggressive approaches to improving its efficiency. We propose and analyze a memory hie...
— Effective address calculation for load and store instructions needs to compete for ALU with other instructions and hence extra latencies might be incurred to data cache accesse...
A Zero Overhead Loop Buffer (ZOLB) is an architectural feature that is commonly found in DSP processors. This buffer can be viewed as a compiler managed cache that contains a sequ...
Gang-Ryung Uh, Yuhong Wang, David B. Whalley, Sanj...