Memory performance is increasingly determining microprocessor performance and technology trends are exacerbating this problem. Most architectures use set-associative caches with LRU replacement policies to combine fast access with relatively low miss rates. To improve replacement decisions in set-associative caches, we develop a new set of compiler algorithms that predict which data will and will not be reused and provide these hints to the architecture. We prove that the hints either match or improve hit rates over LRU. We describe a practical one-bit cache-line tag implementation of our algorithm, called evict-me. On a cache replacement, the architecture will replace a line for which the evict-me bit is set, or if none is set, it will use the LRU bits. We implement our compiler analysis and its output in the Scale compiler. On a variety of scientific programs, using the evict-me algorithm in both the level 1 and 2 cache improves simulated cycle times by up to 34% over the LRU polic...
Zhenlin Wang, Kathryn S. McKinley, Arnold L. Rosen