A key challenge in achieving high performance on software DSM systems is overcoming their relatively large communication latencies. In this paper, we consider two techniques which address this problem: prefetching and multithreading. While previous studies have examined each of these techniques in isolation, this paper is the rst to evaluate both techniques using a consistent hardware platform and set of applications, thereby allowing direct comparisons. In addition, this is the rst study to consider combining prefetching and multithreading in a software DSM. We performed our experiments on real hardware using a full implementation of both techniques. Our experimental results demonstrate that both prefetching and multithreading result in signi cant performance improvements when applied individually. In addition, we observe that three of the eight applications achieve the best overall performance by combining both techniques such that prefetching hides memory latency and multithreading...
Todd C. Mowy, Charles Q. C. Chan, Adley K. W. Lo