This paper evaluates the bene t of adding a shared cache to the network interface as a means of improving the performance of networked workstations con gured as a distributed shared memory multiprocessor. A cache on the network interface, shared by all processors on each cluster, o ers the potential bene ts of retaining evicted processor cache lines, providing implicit prefetching when network cache lines are longer than processor cache lines, and increasing intra-cluster sharing. Using simulation, the performance of eight parallel scienti c applications was evaluated. In each case, we examined in detail the means by which processor cache misses were satis ed. Our results were mixed. For the applications studied, we found that the network cache o ers substantial performance bene t when processor caches are too small to hold the application's primary working set, or when network contention limits application performance. The expected bene ts of implicit prefetching and increased i...
John K. Bennett, Katherine E. Fletcher, William Ev