This paper evaluates the use of per-node multi-threading to hide remote memory and synchronization latencies in a software DSM. As with hardware systems, multi-threading in software systems can be used to reduce the costs of remote requests by switching threads when the current thread blocks. We added multi-threading to the CVM software DSM and evaluated its impact on performance for a suite of common shared memory programs. Multi-threading resulted in speed improvements of at least 17% in three of the seven applications in our suite, and lesser improvements in the other applications. However, we found that i) good performance is not always achievable transparently for non-trivial applications, ii) multi-threading can negatively interact with DSM operations, iii) multi-threading decreases cache and TLB locality, and iv) any multi-threading speedup is dependent on available work.
Kritchalach Thitikamol, Peter J. Keleher