Sciweavers

64 search results - page 10 / 13
» Optimizing Array Distributions in Data-Parallel Programs
Sort
View
IEEEPACT
1998
IEEE
13 years 11 months ago
A Matrix-Based Approach to the Global Locality Optimization Problem
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop tran...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
PLDI
2011
ACM
12 years 10 months ago
The tao of parallelism in algorithms
For more than thirty years, the parallel programming community has used the dependence graph as the main abstraction for reasoning about and exploiting parallelism in “regularâ€...
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Ma...
MICRO
2000
IEEE
176views Hardware» more  MICRO 2000»
13 years 7 months ago
An Advanced Optimizer for the IA-64 Architecture
level of abstraction, compared with the program representation for scalar optimizations. For example, loop unrolling and loop unrolland-jam transformations exploit the large regist...
Rakesh Krishnaiyer, Dattatraya Kulkarni, Daniel M....
WOMPAT
2004
Springer
14 years 24 days ago
Dragon: A Static and Dynamic Tool for OpenMP
A program analysis tool can play an important role in helping users understand and improve OpenMP codes. Dragon is a robust interactive program analysis tool based on the Open64 co...
Oscar Hernandez, Chunhua Liao, Barbara M. Chapman
LCPC
2001
Springer
13 years 12 months ago
Strength Reduction of Integer Division and Modulo Operations
Integer division, modulo, and remainder operations are expressive and useful operations. They are logical candidates to express complex data accesses such as the wrap-around behav...
Jeffrey Sheldon, Walter Lee, Ben Greenwald, Saman ...