Understanding why the performance of a multithreaded program does not improve linearly with the number of cores in a sharedmemory node populated with one or more multicore process...
Efficient performance tuning of parallel programs is often hard. In this paper we describe an approach that uses a uni-processor execution of a multithreaded program as reference ...
This paper presents a new approach for analyzing the performance of grid scheduling algorithms for tasks with dependencies. Finding the optimal procedures for DAG scheduling in Gr...
In this paper we propose a new approach for scheduling data parallel applications on the Grid using irregular array distributions. We implement the scheduler as a new case study fo...
Abstract—In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not supp...