Modern graphics processing units (GPUs) include hardwarecontrolled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are inefficient for general purpose GPU (GPGPU) computing. GPGPU workloads tend to include data structures that would not fit in any reasonably sized caches, leading to very low cache hit rates. This problem is exacerbated by the design of current GPUs, which share small caches between many threads. Caching these streaming data structures needlessly burns power while evicting data that may otherwise fit into the cache. We propose a GPU cache management technique to improve the efficiency of small GPU caches while further reducing their power consumption. It adaptively bypasses the GPU cache for blocks that are unlikely to be referenced again before being evicted. This technique saves energy by avoiding needless insertions and evictions while avoiding cache pollution, resulting in better performance. We show that, wi...
Yingying Tian, Sooraj Puthoor, Joseph L. Greathous