While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/p...
The mpC language was developed to write efJicientand portable programsfor wide range of distributed memory machines. It supports both task and data parallelism, allows both static...
Dmitry Arapov, Alexey Kalinov, Alexey L. Lastovets...
Graphics processing units (GPUs) are powerful devices capable of rapid parallel computation. GPU programming, however, can be quite difficult, limiting its use to experienced prog...
Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous ...
Sridhar Gopal, T. N. Vijaykumar, James E. Smith, G...
There has been much work recently on improving the locality performance of loop nests in scientific programs through the use of loop as well as data layout optimizations. However,...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...