In this paper, we present and evaluate two techniques that use different styles of hardware support to provide data structure specific processor cache information. In one approach, hardware performance counter overflow interrupts are used to sample cache misses. In the other, cache misses within regions of memory are counted to perform an n-way search for the areas in which the most misses are occurring. We present a simulation-based study and comparison of the two techniques. We find that both techniques can provide accurate information, and describe the relative advantages and disadvantages of each.
Bryan R. Buck, Jeffrey K. Hollingsworth