Grid systems are well-known for its high performance computing or large data storage with inexpensive devices. They can be categorized into two major types: computational grid and ...
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
In this paper we present BioOpera, an extensible process support system for cluster-aware computing. It features an intuitive way to specify computations, as well as improved supp...
Communication of large data volumes is a core functionality of distributed systems middleware, namely, for interconnecting components, for distributed computation and for fault tol...
We present a tunable diagnostic protocol for generic time-triggered (TT) systems to detect crash and send/receive omission faults. Compared to existing diagnostic and membership p...
Marco Serafini, Neeraj Suri, Jonny Vinter, Astrit ...