Abstract—Using MPI as communication interface, one or several applications may introduce complex communication behaviors over the network cluster. This effect is increased when n...
High-performance computing clusters running longlived tasks currently cannot have kernel software updates applied to them without causing system downtime. These clusters miss oppo...
: The distributed recovery block (DRB) scheme is a widely applicable approach for realizing both hardware and software fault tolerance in real-time distributed and parallel compute...
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. ...
J. Gregory Steffan, Christopher B. Colohan, Antoni...
The high chip-level integration enables the implementation of large-scale parallel processing architectures with 64 and more processing nodes on a single chip or on an FPGA device...
Mouna Baklouti, Yassine Aydi, Philippe Marquet, Je...