Failures of all forms happen: from losing single network packets to site-wide disasters. Since businesses rely heavily on their data, it is imperative that failures require minima...
This paper investigates the practical issues in applying network tomography to monitor failures. We outline an approach for selecting paths to monitor, detecting and confirming th...
– A safety-critical real-time computer system must provide its services with a dependability that is much better than the dependability of any one of its constituent components. ...
Failure Mode and Effect Analysis (FMEA) is a method for assessing cause-consequence relations between component faults and hazards that may occur during the lifetime of a system. ...
Diagnosing production run failures is a challenging yet important task. Most previous work focuses on offsite diagnosis, i.e. development site diagnosis with the programmers prese...
Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanth...