Sciweavers

ISHPC
2003
Springer

Code and Data Transformations for Improving Shared Cache Performance on SMT Processors

14 years 4 months ago
Code and Data Transformations for Improving Shared Cache Performance on SMT Processors
Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-performance ratios. Sharing a cache between simultaneously executing threads causes excessive conflict misses. This paper proposes software solutions for dynamically partitioning the shared cache of an SMT processor, via the use of three methods originating in the optimizing compilers literature: dynamic tiling, copying and block data layouts. The paper presents an algorithm that combines these transformations and two runtime mechanisms to detect cache sharing between threads and react to it at runtime. The first mechanism uses minimal kernel extensions and the second mechanism uses information collected from the processor hardware counters. Our experimental results show that for regular, perfect loop nests, these transformations are very effective in coping with shared caches. When the caches are shared between threads from the same address space, performance is improved by 16–29% on average....
Dimitrios S. Nikolopoulos
Added 07 Jul 2010
Updated 07 Jul 2010
Type Conference
Year 2003
Where ISHPC
Authors Dimitrios S. Nikolopoulos
Comments (0)