The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Access to remote memory is likely to be slow, compared to the ever-increasing speeds of processors. Thus, any scalable architecture must rely on techniques that can cope with the large latency of memory accesses to reduce/hide/tolerate remotememory-access latencies. In this paper, we shall consider two architectural techniques, that address the latency problem: Pl:efetching and Multithreading. We also intend to develop and analyse such techniques, using simple but useful analytical models that predict the performance benefits achievable on bus-based multiprocessors. First, we study the effects of various parameters such as latency, bandwidth, degree of prefetching on speed-up and network utilization of the system. Then, using a multilevel modeling methodology for Petri Nets, we show that multithreaded architectures have higher processor utilizat...
Edward D. Moreno, Sergio Takeo Kofuji, Marcelo H.