Code and Data Transformations for Improving Shared Cache Performance on SMT Processors

14 years 7 months ago

Download people.cs.vt.edu

Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-performance ratios. Sharing a cache between simultaneously executing threads causes excessive conﬂict misses. This paper proposes software solutions for dynamically partitioning the shared cache of an SMT processor, via the use of three methods originating in the optimizing compilers literature: dynamic tiling, copying and block data layouts. The paper presents an algorithm that combines these transformations and two runtime mechanisms to detect cache sharing between threads and react to it at runtime. The ﬁrst mechanism uses minimal kernel extensions and the second mechanism uses information collected from the processor hardware counters. Our experimental results show that for regular, perfect loop nests, these transformations are very eﬀective in coping with shared caches. When the caches are shared between threads from the same address space, performance is improved by 16–29% on average....

Dimitrios S. Nikolopoulos

Real-time Traffic

Distributed And Parallel Computing | ISHPC 2003 | On-chip Caches | SMT Processors | Threads |

claim paper

Post Info
More Details (n/a)

Added	07 Jul 2010
Updated	07 Jul 2010
Type	Conference
Year	2003
Where	ISHPC
Authors	Dimitrios S. Nikolopoulos

Comments (0)

Sciweavers

Code and Data Transformations for Improving Shared Cache Performance on SMT Processors

Distributed And Parallel Computing | ISHPC 2003 | On-chip Caches | SMT Processors | Threads |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers