Data parallel compilers have long aimed to equal the performance of carefully hand-optimized parallel codes. For tightly-coupled applications based on line sweeps, this goal has b...
Exploiting compile time knowledge to improve memory bandwidth can produce noticeable improvements at run-time [13, 1]. Allocating the data structure [13] to separate memories when...
Block-wise access to data is a central theme in the design of efficient external memory (EM) algorithms. A second important issue, when more than one disk is present, is fully par...
Frank K. H. A. Dehne, David A. Hutchinson, Anil Ma...
This paper describes a monitoring environment that enables the analysis of memory access behavior of applications in a selective way with a potentially very high degree of detail. ...
Edmond Kereku, Tianchao Li, Michael Gerndt, Josef ...
The advent of multicores presents a promising opportunity for exploiting fine grained parallelism present in programs. Programs parallelized in the above fashion, typically involv...