We present a scheme to guarantee that the execution of real-time tasks can tolerate transient and intermittent faults assuming any queue- based scheduling technique. The scheme is based on reserving suficient slack in a schedule such that a task can be re-executed before its deadline without compromising guarantees given to other tasks. Only enough slack is reserved in the schedule to guarantee fault tolerance if at most one fault occurs within a time interval. This results in increased schedulability and a very low percentage of deadline misses even if no restriction is placed on the fault separation. W e provide two algorithms to solve the problem of adding fault tolerance to a queue of real-time tasks. The first is a dynamic programming optimal solution and the second is a greedy heuristic which closely approzimates the optimal.
Sunondo Ghosh, Rami G. Melhem, Daniel Mossé