The increasing complexity of hardware features for recent processors makes high performance code generation very challenging. In particular, several optimization targets have to b...
This paper emphasizes on load balancing issues associated with hybrid programming models for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid paralle...
Most current data dependence tests cannot handle loop bounds or array subscripts that are symbolic, nonlinear expressions e.g. Ani+j, where 0 j n. In this paper, we describe a d...
We address the problem of efficient out-of-core code generation for a special class of imperfectly nested loops encoding tensor contractions arising in quantum chemistry computati...
—When parallel programs are executed on multiprocessors with private caches, a set of data may be repeatedly used and modified by different threads. Such data sharing can often r...