Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

148

MICRO
1999
IEEE

favoriteEmaildiscussreport

143views Hardware» more MICRO 1999»

Code Transformations to Improve Memory Parallelism

15 years 6 months ago

Code Transformations to Improve Memory Parallelism

Download www.jilp.org

Current microprocessors incorporate techniques to exploit instruction-level parallelism (ILP). However, previous work has shown that these ILP techniques are less effective in removing memory stall time than CPU time, making the memory system a greater bottleneck in ILP-based systems than in previous-generation systems. These deficiencies arise largely because applications present limited opportunities for an out-oforder issue processor to overlap multiple read misses, the dominant source of memory stalls. This work proposes code transformations to increase parallelism in the memory system by overlapping multiple read misses within the same instruction window, while preserving cache locality. We present an analysis and transformation framework suitable for compiler implementation. Our simulation experiments show execution time reductions averaging 20% in a multiprocessor and 30% in a uniprocessor. A substantial part of these reductions comes from increases in memory parallelism. We se...

Vijay S. Pai, Sarita V. Adve

Real-time Traffic

Hardware | Memory Stall Time | Memory Stalls | MICRO 1999 | Multiple Read Misses |

claim paper

Related Content

» Automatic code generation for executing tiled nested loops onto parallel architectures

» Multithreaded Geant4 Semiautomatic Transformation into Scalable ThreadParallel Software

» Parallelization of Benchmarks for Scalable SharedMemory Multiprocessors

» Improving Offset Assignment on Embedded Processors Using Transformations

» Loop Transformation Methodologies for ArrayOriented Memory Management

» An Efficient SIMD Architecture with Parallel Memory for 2D Cosine Transforms of Video Codi...

» A Compiler Optimization Algorithm for SharedMemory Multiprocessors

» Improving the Memory Bandwidth Utilization Using Loop Transformations

» Memory Access Coalescing A technique for Eliminating Redundant memory Accesses

Post Info
More Details (n/a)

Added	04 Aug 2010
Updated	04 Aug 2010
Type	Conference
Year	1999
Where	MICRO
Authors	Vijay S. Pai, Sarita V. Adve

Comments (0)