Access/execute architectures have several advantages over more traditional architectures. Because address generation and memory access are decoupled from operand use, memory laten...
Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorith...
Monica S. Lam, Edward E. Rothberg, Michael E. Wolf
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs. We have used this information to explore the relationship between kern...
William J. Bolosky, Michael L. Scott, Robert P. Fi...
The memory consistency model supported by a multiprocessor architecture determines the amount of buffering and pipelining that may be used to hide or reduce the latency of memory ...
Kourosh Gharachorloo, Anoop Gupta, John L. Henness...