Sciweavers

2400 search results - page 73 / 480
» Systems Failures
Sort
View
PODC
2005
ACM
14 years 2 months ago
Building scalable and robust peer-to-peer overlay networks for broadcasting using network coding
We propose a scheme for building peer-to-peer overlay networks for broadcasting using network coding. The scheme addresses many practical issues such as scalability, robustness, c...
Kamal Jain, László Lovász, Ph...
EUROPAR
2005
Springer
14 years 2 months ago
Faults in Large Distributed Systems and What We Can Do About Them
Scientists are increasingly using large distributed systems built from commodity off-the-shelf components to perform scientific computation. Grid computing has expanded the scale ...
George Kola, Tevfik Kosar, Miron Livny
CAI
2010
Springer
13 years 6 months ago
Achieving Cost-Effective Software Reliability Through Self-Healing
Heterogeneity, mobility, complexity and new application domains raise new software reliability issues that cannot be met cost-effectively only with classic software engineering ap...
Alessandra Gorla, Mauro Pezzè, Jochen Wuttk...
IPPS
2005
IEEE
14 years 2 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
NSDI
2007
13 years 11 months ago
Beyond One-Third Faulty Replicas in Byzantine Fault Tolerant Systems
Byzantine fault tolerant systems behave correctly when no more than f out of 3f + 1 replicas fail. When there are more than f failures, traditional BFT protocols make no guarantee...
Jinyuan Li, David Mazières