Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of ...
I consider the problem of the domain-specific optimization of programs. I review different approaches, discuss their potential, and sketch instances of them from the practice of ...
This paper presents an extensive characterization, tuning, and optimization of parallel I/O on the Cray XT supercomputer, named Jaguar, at Oak Ridge National Laboratory. We have c...
In this paper, we consider the problem of nding llpreserving ordering of a sparse symmetric and positive de nite matrix such that the reordered matrix is suitable for parallel fac...
Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory lat...