Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in or der to decrease high-cost, inter-processor communication. This paper reexamines several compiler optimizations in the context of simultaneous multithr eading (SMT), a processor architecture that issues instructions from multiple threads to the functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of pr ocessor and memory system resources; unlike curr ent uniprocessors, SMT exposes and benefits fr om inter-thread instruction-level parallelism when hiding latencies. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT. We revisit three optimizations in this light: loop-iteration scheduling, softwar e speculative e...
Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay