1 Over the last years, an increasing number of safety-critical tasks have been demanded to computer systems. In particular, safety-critical computer-based applications are hitting ...
Maurizio Rebaudengo, Matteo Sonza Reorda, Marco To...
—Ultra large scale (ULS) systems are future software intensive systems that have billions of lines of code, composed of heterogeneous, changing, inconsistent and independent elem...
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide b...
Hoang Bui, Peter Bui, Patrick J. Flynn, Douglas Th...
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...