—The potential for improving the performance of data-intensive scientific programs by enhancing data reuse in cache is substantial because CPUs are significantly faster than memory. Traditional performance tools typically collect or simulate cache miss counts or rates and attribute them at the function level. While such information identifies program scopes that exhibit a large cache miss rate, it is often insufficient to diagnose the causes for poor data locality and to identify what program transformations would improve memory hierarchy utilization. This paper describes an approach that uses memory reuse distance to identify an application’s most significant memory access patterns causing cache misses and provide insight into ways of improving data reuse. Unlike previous approaches, our tool combines (1) analysis and instrumentation of fully optimized binaries, (2) online analysis of reuse patterns, (3) fine-grain attribution of measurements and models to statements, loops ...
Gabriel Marin, John M. Mellor-Crummey