This paper analyzes the impact of hardware multithreading support on the performance of distributed shared-memory DSM multiprocessors built out of heterogeneous, single-chip computing nodes. Area-e ciency arguments motivate a heterogeneous, hierarchical organization HDSM consisting of few processors with extensive support for instruction-level parallelism and large caches, and a larger number of simpler processors with smaller caches for e cient execution of thread-parallel code. Such heterogeneous machine relies on the execution of multiple threads per processor to deliver high performance for unmodi ed applications. This paper quantitatively studies the performance of HDSMs for software-based and hardware-multithreaded scenarios. The simulation-based experiments in this paper consider a 16-node multiprocessor, six homogeneous shared-memory benchmarks from the SPLASH-2 suite, and a decision-support application C4.5. Simulation results show that a hardware-based, block-multithreaded H...
Renato J. O. Figueiredo, Jeffrey P. Bradford, Jos&