Memorylatency isbecominganincreasingly importantperformance bottleneck, especially in multiprocessors. One technique for tolerating memory latency is multithreading, whereby we switch between threads upon expensive cache misses. In contrast with previous work on multithreading, we explore a new approach that is software-controlled rather than hardware-controlled. To implement software-controlled multithreading, we use informing memory operations to quickly trap upon cache misses to a miss handler which performs the actual thread switching in software. Our experimental results demonstrate that software-controlled multithreading can result in signi cant performance gains on a shared-memory multiprocessor, with the majority of applications speeding up by 10 or more, and one application speeding up by 16. In addition, we nd that by selectively applying a register partitioning optimization to reduce the thread-switching overhead, we can increase the overall speedups to as much as 25. Gi...
Todd C. Mowry, Sherwyn R. Ramkissoon