Current microprocessors aggressively exploit instructionlevel parallelism (ILP) through techniques such as multiple issue, dynamic scheduling, and non-blocking reads. Recent work has shown that memory latency remains a signi cant performance bottleneck for shared-memory multiprocessor systems built of such processors. This paper provides the rst study of the e ectiveness of software-controlled non-binding prefetching in sharedmemory multiprocessors built of state-of-the-art ILP-based processors. We nd that software prefetching results in signi cant reductions in execution time (12% to 31%) for three out of ve applications on an ILP system. However, compared to previous-generation systems, software prefetching is signi cantly less e ective in reducing the memory stall component of execution time on an ILP system. Consequently, even after adding software prefetching, memory stall time accounts for over 30% of the total execution time in four out of ve applications on our ILP system. Thi...
Parthasarathy Ranganathan, Vijay S. Pai, Hazim Abd