Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorith...
Monica S. Lam, Edward E. Rothberg, Michael E. Wolf
Caches enhance the performance of multiprocessors by reducing network trac and average memory access latency. However, cache-based systems must address the problem of cache coher...
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs. We have used this information to explore the relationship between kern...
William J. Bolosky, Michael L. Scott, Robert P. Fi...
Access/execute architectures have several advantages over more traditional architectures. Because address generation and memory access are decoupled from operand use, memory laten...
The memory consistency model supported by a multiprocessor architecture determines the amount of buffering and pipelining that may be used to hide or reduce the latency of memory ...
Kourosh Gharachorloo, Anoop Gupta, John L. Henness...