We are currently developing Willow, a shared-memory multiprocessor whose design provides system capacity and performance capable of supporting over a thousand commercial microprocessors. Most recently, we have focused our attention on the design of a sixty-four processor prototype that tests most of our ideas about scalability. The design of such a multiprocessor poses a number of challenges to the computer architect. In this paper we describe the factors that traditionally have limited the scalability of shared-memory systems. These include: enforcing sequential consistency, ine cient synchronization, memory latency and bandwidth limitations, bus memory contention, the necessity to enforce inclusion on lower-level caches, and limited I/O bandwidth. We then describe how the Willow architecture addresses each of these issues. Finally, we present data that evaluates the e ect of the major architectural innovations in Willow on the performance of several parallel applications. These inno...
John K. Bennett, Sandhya Dwarkadas, Jay A. Greenwo