Efficient performance tuning of parallel programs is often hard. In this paper we describe an approach that uses a uni-processor execution of a multithreaded program as reference ...
Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient run-time system for executing general as...
This paper presents the design and implementation of the MPI-IO interface for the Clusterfile parallel file system. The approach offers the opportunity of achieving a high corelat...
This paper addresses the problem of efficient execution of a batch of data-intensive tasks with batch-shared I/O behavior, on coupled storage and compute clusters. Two scheduling...
The emergence of multicore processors has heightened the need for effective parallel programming practices. In addition to writing new parallel programs, the next generation of pr...
William Thies, Vikram Chandrasekhar, Saman P. Amar...