In this paper, we propose a hardware performance monitor that provides support not only for measuring cache misses and the addresses associated with them, but also for determining what data is being evicted from the cache when a miss occurs. We describe how to use this hardware support to efficiently determine the cache behavior of application data structures at the source code level. We also present the results of a simulation-based study of this technique, in which we examined the overhead, perturbation of results, and usefulness of collecting this information.
Bryan R. Buck, Jeffrey K. Hollingsworth