Over the past decade the number of processors in the high performance facilities went up to hundreds of thousands. As a direct consequence, while the computational power follow th...
Aurelien Bouteiller, George Bosilca, Jack Dongarra
: Intrusion tolerance is a recent approach to deal with intentional and malicious failures. It combines the research on fault tolerance with the research on security, and relies on...
In the next five years, the number of processors in high-end systems for scientific computing is expected to rise to tens and even hundreds of thousands. For example, the IBM Blu...
In traditional distributed simulation schemes, entire simulation needs to be restarted if any of the participating LP crashes. This is highly undesirable for long running simulati...
In distributed computing systems, processes in different hosts take checkpoints to survive failures. For mobile computing systems, due to certain new characteristics conventional d...