Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel compu...
We investigate the use of distributed measurements for estimating and updating the performance of a cellular system. Specifically, we discuss the number and placement of sensors in...
Liang Xiao, Larry J. Greenstein, Narayan B. Manday...
This paper presents a new fault injection tool called Exhaustif (Exhaustive Workbench for Systems Reliability). Exhaustif is a SWIFI fault injection tool for fault tolerance verif...
In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software ...
TreadMarks is a distributed shared memory DSM system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultr...
Peter J. Keleher, Alan L. Cox, Sandhya Dwarkadas, ...