Internet measurements have shown that network failures happen frequently, and that existing routing protocols can take multiple seconds, or even minutes, to converge after a failu...
Dan Pei, Lan Wang, Daniel Massey, Shyhtsun Felix W...
Many fault-tolerant group communication middleware systems have been implemented assuming crash failure semantics. While this assumption is not unreasonable, it becomes hard to ju...
Dimane Mpoeleng, Paul D. Ezhilchelvan, Neil A. Spe...
Wireless Medium Access Control (MAC) protocols such as IEEE 802.11 use distributed contention resolution mechanisms for sharing the wireless channel. In this environment, selfish...
A hierarchical modeling framework for the dependability evaluation of Internet-based applications is presented and illustrated on a travel agency example. Modeling is carried out ...
This paper describes an experimental study of Linux kernel behavior in the presence of errors that impact the instruction stream of the kernel code. Extensive error injection exper...
Weining Gu, Zbigniew Kalbarczyk, Ravishankar K. Iy...
Our goal is to automatically obtain a distributed and fault-tolerant embedded system: distributed because the system must run on a distributed architecture; fault-tolerant because...
Abstract: We present a new approach that uses compilerdirected fault-injection for coverage testing of recovery code in Internet services to evaluate their robustness to operating ...
Chen Fu, Richard P. Martin, Kiran Nagaraja, Thu D....