Scheduling for speculative parallelization is a problem that remained unsolved despite its importance. Simple methods such as Fixed-Size Chunking (FSC) need several ‘dry-runs’...
We have created ZapC, a novel system for transparent coordinated checkpoint-restart of distributed network applications on commodity clusters. ZapC provides a thin virtualization ...
To be competitive, Enterprises are collecting and analyzing increasingly large amount of data in order to derive business insights. However, there are at least two challenges to m...
Process skew is an important factor in the performance of parallel applications, especially in large-scale clusters. Reduction is a common collective operation which, by its natur...
Adam Wagner, Darius Buntinas, Dhabaleswar K. Panda...
In this article we present the design choices and the evaluation of a batch scheduler for large clusters, named OAR. This batch scheduler is based upon an original design that emp...
Nicolas Capit, Georges Da Costa, Yiannis Georgiou,...