Fault tolerance will be a fundamental imperative in the next decade as machines containing hundreds of thousands of cores will be installed at various locations. In this context, ...
Esteban Meneses, Celso L. Mendes, Laxmikant V. Kal...
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
In this paper we consider the following scenario. A set of n jobs with different threads is being run concurrently. Each job has an associated weight, which gives the proportion ...
Micah Adler, Petra Berenbrink, Tom Friedetzky, Les...
Communication latencies within critical sections constitute a major bottleneck in some classes of emerging parallel workloads. In this paper, we argue for the use of Inferentially...
We consider the problem of broadcasting a large message in a large scale distributed platform. The message must be sent from a source node, with the help of the receiving peers whi...