Sciweavers

2400 search results - page 8 / 480
» Systems Failures
Sort
View
DSN
2006
IEEE
14 years 1 months ago
A large-scale study of failures in high-performance computing systems
Designing highly dependable systems requires a good understanding of failure characteristics. Unfortunately, little raw data on failures in large IT installations is publicly avai...
Bianca Schroeder, Garth A. Gibson
CORR
2006
Springer
80views Education» more  CORR 2006»
13 years 7 months ago
Exact Failure Frequency Calculations for Extended Systems
This paper shows how the steady-state availability and failure frequency can be calculated in a single pass for very large systems, when the availability is expressed as a product...
Annie Druault-Vicard, Christian Tanguy
ICPP
2007
IEEE
14 years 1 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
EDCC
2005
Springer
14 years 1 months ago
Failure Detection with Booting in Partially Synchronous Systems
Unreliable failure detectors are a well known means to enrich asynchronous distributed systems with time-free semantics that allow to solve consensus in the presence of crash failu...
Josef Widder, Gérard Le Lann, Ulrich Schmid
PODC
2009
ACM
14 years 6 days ago
The weakest failure detector for solving k-set agreement
A failure detector is a distributed oracle that provides processes in a distributed system with hints about failures. The notion of a weakest failure detector captures the exact a...
Eli Gafni, Petr Kuznetsov