As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety o...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...
—This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facilit...
Sebastien Bratieres, Jurgen Van Gael, Andreas Vlac...
Many mathematical models have been proposed to evaluate the execution performance of an application with and without checkpointing in the presence of failures. They assume that th...
The Earth’s tectonic plates are strong, viscoelastic shells which make up the outermost part of a thermally convecting, predominantly viscous layer; at the boundaries between pla...
Louis Moresi, David May, Justin Freeman, Bill F. A...
—P2P storage systems use replication to provide a certain level of availability. While the system must generate new replicas to replace replicas lost to permanent failures, it ca...