Sciweavers

339 search results - page 40 / 68
» Modeling Faults of Distributed, Reactive Systems
Sort
View
IPPS
1998
IEEE
14 years 21 days ago
Migration and Rollback Transparency for Arbitrary Distributed Applications in Workstation Clusters
Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. The PBEAM syst...
Stefan Petri, Matthias Bolz, Horst Langendörf...
MIDDLEWARE
2009
Springer
14 years 3 months ago
Why Do Upgrades Fail and What Can We Do about It?
Abstract. Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading ca...
Tudor Dumitras, Priya Narasimhan
ISPA
2004
Springer
14 years 1 months ago
A Fault Tolerance Protocol for Uploads: Design and Evaluation
This paper investigates fault tolerance issues in Bistro, a wide area upload architecture. In Bistro, clients first upload their data to intermediaries, known as bistros. A destin...
Leslie Cheung, Cheng-Fu Chou, Leana Golubchik, Yan...
CCGRID
2006
IEEE
14 years 2 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
HPDC
1996
IEEE
14 years 18 days ago
The Core Legion Object Model
This document describes the core Legion object model. The model specifies the composition and functionality of Legion's core objects--those objects that cooperate to create, ...
Michael J. Lewis, Andrew S. Grimshaw