Fault-tolerance techniques based on checkpointing and message logging have been increasingly used in real-world applications to reduce service down-time. Most industrial applicati...
This paper explores hardware-implemented error-detection and security mechanisms embedded as modules in a hardware-level framework called the Reliability and Security Engine (RSE)...
Nithin Nakka, Zbigniew Kalbarczyk, Ravishankar K. ...
We introduce Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system f...
Dual-execution/checkpointing based transient error tolerance techniques have been widely used in the high-end mission critical systems. These techniques, however, are not very att...
Abstract—In large distributed systems, where shared resources are owned by distinct entities, there is a need to reflect resource ownership in resource allocation. An appropriat...
Jose Nelson Falavinha Junior, Aleardo Manacero Jun...