Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
In this work we present a parallel algorithm for the solution of a least squares problem with structured matrices. This problem arises in many applications mainly related to digit...
Pedro Alonso, Antonio M. Vidal, Alexey L. Lastovet...
With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across ...
Many large-scale production parallel programs often run for a very long time and require data checkpoint periodically to save the state of the computation for program restart and/o...
Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...
A high-level understanding of how an application executes and which performance characteristics it exhibits is essential in many areas of high performance computing, such as applic...