In this paper, we present an efficient algorithm, called CASS-II, for task clustering without task duplication. Unlike the DSC algorithm, which is empirically the best known algor...
Efficient storage and retrieval of large multidimensional datasets is an important concernfor large-scale scientific computations such as long-running time-dependent simulations w...
Abstract. Nested data-parallel programs often have large memory requirements due to their high degree of parallelism. Piecewise execution is an implementation technique used to min...
We describe a software solution to the problem of automatic parallelization of linear algebra code on multi-processor and multi-core architectures. This solution relies on the defi...
The aim of the paper is to introduce techniques in order to optimize the parallel execution time of sorting on heterogeneous platforms (processors speeds are related by a constant...