Sciweavers

CONCURRENCY
2006

An efficient memory operations optimization technique for vector loops on Itanium 2 processors

13 years 11 months ago
An efficient memory operations optimization technique for vector loops on Itanium 2 processors
To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking structure cause severe performance loss, even for very simple regular codes. We also show that grouping the memory operations in a pseudo-vectorized way enables the compiler to generate more effective code for the Itanium 2 processor. The impact of this code optimization technique on register pressure is analyzed for various vectorization schemes. keywords Performance Measurement, Cache Optimization, Memory Access Optimization, Bank Conflicts, Memory Address Disambiguation, Instruction Level Parallelism.
William Jalby, Christophe Lemuet, Sid Ahmed Ali To
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where CONCURRENCY
Authors William Jalby, Christophe Lemuet, Sid Ahmed Ali Touati
Comments (0)