Sciweavers

799 search results - page 17 / 160
» On Failures and Faults
Sort
View
SRDS
1999
IEEE
13 years 11 months ago
Fault-Tolerant Replication Management in Large-Scale Distributed Storage Systems
Failures of all forms happen: from losing single network packets to site-wide disasters. Since businesses rely heavily on their data, it is imperative that failures require minima...
Richard A. Golding, Elizabeth Borowsky
CCR
2008
113views more  CCR 2008»
13 years 7 months ago
Practical issues with using network tomography for fault diagnosis
This paper investigates the practical issues in applying network tomography to monitor failures. We outline an approach for selecting paths to monitor, detecting and confirming th...
Yiyi Huang, Nick Feamster, Renata Teixeira
ASWSD
2004
Springer
14 years 27 days ago
On the Fault Hypothesis for a Safety-Critical Real-Time System
– A safety-critical real-time computer system must provide its services with a dependability that is much better than the dependability of any one of its constituent components. ...
Hermann Kopetz
QEST
2007
IEEE
14 years 1 months ago
Probabilistic Model-Checking Support for FMEA
Failure Mode and Effect Analysis (FMEA) is a method for assessing cause-consequence relations between component faults and hazards that may occur during the lifetime of a system. ...
Lars Grunske, Robert Colvin, Kirsten Winter
SOSP
2007
ACM
14 years 4 months ago
Triage: diagnosing production run failures at the user's site
Diagnosing production run failures is a challenging yet important task. Most previous work focuses on offsite diagnosis, i.e. development site diagnosis with the programmers prese...
Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanth...