A common debugging strategy involves reexecuting a program (on a given input) over and over, each time gaining more information about bugs. Such techniques can fail on message-pas...
Abstract. We describe a group membership protocol, called the timewheel group membership protocol, for a timed asynchronous distributed system. This protocol is a part of the timew...
This paper shows how lightpath-based networks can allow challenging, fine-grained parallel supercomputing applications to be run on a grid, using parallel retrograde analysis on ...
Kees Verstoep, Jason Maassen, Henri E. Bal, John W...
Most parallel machines, such as clusters, are spaceshared in order to isolate batch parallel applications from each other and optimize their performance. However, this leads to lo...
In this paper we present an algorithm for scheduling parallel applications that consist of a divisible workload. Our algorithm uses multiple rounds to overlap communication and co...