Sciweavers

511 search results - page 80 / 103
» A Model for Space-Correlated Failures in Large-Scale Distrib...
Sort
View
PNPM
1987
13 years 12 months ago
Stochastic Petri Net Analysis of a Replicated File System
We present a stochastic Petri net model of a replicated file system in a distributed environment where replicated files reside on different hosts and a voting algorithm is used to...
Joanne Bechta Dugan, Gianfranco Ciardo
ISPA
2004
Springer
14 years 1 months ago
Highly Reliable Linux HPC Clusters: Self-Awareness Approach
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
Chokchai Leangsuksun, Tong Liu, Yudan Liu, Stephen...
RTAS
2010
IEEE
13 years 6 months ago
Middleware for Resource-Aware Deployment and Configuration of Fault-Tolerant Real-time Systems
Developing large-scale distributed real-time and embedded (DRE) systems is hard in part due to complex deployment and configuration issues involved in satisfying multiple quality f...
Jaiganesh Balasubramanian, Aniruddha S. Gokhale, A...
CLUSTER
2003
IEEE
14 years 1 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
SRDS
2007
IEEE
14 years 2 months ago
The Fail-Heterogeneous Architectural Model
Fault tolerant distributed protocols typically utilize a homogeneous fault model, either fail-crash or fail-Byzantine, where all processors are assumed to fail in the same manner....
Marco Serafini, Neeraj Suri