Both hardware and software prefetching have been shown to be e ective in tolerating the large memory latencies inherent in shared-memory multiprocessors however, both types of prefetching have their shortcomings. In this paper, we propose an integrated hardware/software prefetching method that uses simple hardware that can handle most data accesses and software prefetching for the few remaining accesses. This yields an e ective scheme that minimizes both CPU overhead and hardware costs. Executiondriven simulations show our method to be very e ective.
Edward H. Gornish, Alexander V. Veidenbaum