Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
— Fault-tolerance is an important system metric for many operating environments, from automotive to space exploration. The conventional technique for improving system reliability...
John Lach, William H. Mangione-Smith, Miodrag Potk...
1 Over the last years, an increasing number of safety-critical tasks have been demanded to computer systems. In particular, safety-critical computer-based applications are hitting ...
Maurizio Rebaudengo, Matteo Sonza Reorda, Marco To...
Due to modern technology trends such as decreasing feature sizes and lower voltage levels, fault tolerance is becoming increasingly important in computing systems. Shared memory i...
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...