We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the ...
Ming Li, Wenchao Tao, Daniel Goldberg, Israel Hsu,...
—Increasingly there is a demand for more scalable fault management schemes to cope with the ever increasing growth and complexity of modern networks. Current distributed fault co...
: Fault management in high performance cluster networks has been focused on the notion of hard faults (i.e., link or node failures). Network degradations that negatively impact per...
Jeffrey J. Evans, Seongbok Baik, Cynthia S. Hood, ...
Validation of distributed systems using fault injection is difficult because of their inherent complexity, lack of a global clock, and lack of an easily accessible notion of a gl...
Ramesh Chandra, Michel Cukier, Ryan M. Lefever, Wi...
Project managers use inspection data as input to capture-recapture (CR) models to estimate the total number of faults present in a software artifact. The CR models use the number ...