Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
— This paper describes a modeling framework for evaluating the impact of faults on the output of streaming ions. Our model is based on three abstractions: stream operators, strea...
Gabriela Jacques-Silva, Zbigniew Kalbarczyk, Bugra...
We address the problem of making online, parallel query plans fault-tolerant: i.e., provide intra-query fault-tolerance without blocking. We develop an approach that not only achi...
—Application mobility is an efficient way to mask uneven conditioning and reduce users’ distractions in pervasive environments. However, since mobility brings more dynamism and...
Off-The-Shelf (OTS) software components are being used within complex safety-critical applications. However, to use these untrustworthy components with confidence, it is necessary...
1 In this paper we propose an approach to the design optimization of fault-tolerant hard real-time embedded systems, which combines hardware and software fault tolerance techniques...
Viacheslav Izosimov, Ilia Polian, Paul Pop, Petru ...
The designation “fault tolerant software” has been used for techniques ranging from roll-back and retry to N-version programming, from data mirroring to functional redundancy....
The development of dependable software systems is a costly undertaking. Fault tolerance techniques as well as self-repair capabilities usually result in additional system complexi...
We analyze the effect of errors in branch predictors, a representative example of speculative processor subsystems, to motivate the necessity for fault tolerance in such subsystem...