It is well recognized that LRU cache-line replacement can be ineffective for applications with large working sets or non-localized memory access patterns. Specifically, in lastlevel processor caches, LRU can cause cache pollution by inserting non-reuseable elements into the cache while evicting reusable ones. The work presented in this paper addresses last-level cache pollution through a dynamic operating system mechanism, called ROCS, requiring no change to underlying hardware and no change to applications. ROCS employs hardware performance counters on a commodity processor to characterize application cache behavior at run-time. Using this online profiling, cache unfriendly pages are dynamically mapped to a pollute buffer in the cache, eliminating competition between reusable and nonreusable cache lines. The operating system implements the pollute buffer through a page-coloring based technique, by dedicating a small slice of the last-level cache to store nonreusable pages. Measurem...
Livio Soares, David K. Tam, Michael Stumm