This paper explores area/parallelism tradeo s in the design of distributed shared-memory (DSM) multiprocessors built out of large single-chip computing nodes. In this context, area-e ciency arguments motivate a heterogeneous organization consisting of few nodes with large caches designed for single-thread parallelism, and a larger number of nodes with smaller caches designed for multi-thread parallelism. Quantitative performance of such organization is reported for a set of homogeneous multiprocessor programs from the SPLASH-2 benchmark suite. These programs are mapped onto the heterogeneous processors without source code modi cations via static thread assignment policies. Simulationbased analysis is used to compare the performance of heterogeneous and homogeneous DSMs that occupy the same silicon area. The analysis shows that a 4-node heterogeneous DSM with 21 processors outperforms its homogeneous counterpart with 4 processors by an average of 36% for the studied multiprocessor work...
Renato J. O. Figueiredo, José A. B. Fortes