Continued scaling of CMOS technology to smaller transistor sizes makes modern processors more susceptible to both transient and permanent hardware faults. Circuitlevel techniques ...
Dense linear algebra codes are often expressed and coded in terms of BLAS calls. This approach, however, achieves suboptimal performance due to the overheads associated to such cal...
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
Enterprises may use a service-oriented architecture (SOA) to provide a streamlined interface to their business processes. To scale up the system, each tier in a composite service ...
Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. The PBEAM syst...