Sciweavers

234 search results - page 6 / 47
» Optimal recovery schemes in fault tolerant distributed compu...
Sort
View
CLUSTER
2004
IEEE
14 years 5 days ago
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé
ECRTS
2000
IEEE
14 years 25 days ago
Tolerating faults while maximizing reward
The imprecise computation(IC) model is a general scheduling framework, capable of expressing the precision vs. timeliness trade-off involved in many current real-time applications...
Hakan Aydin, Rami G. Melhem, Daniel Mossé
CCGRID
2008
IEEE
13 years 8 months ago
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed
SC
2009
ACM
14 years 3 months ago
Supporting fault-tolerance for time-critical events in distributed environments
In this paper, we consider the problem of supporting fault tolerance for adaptive and time-critical applications in heterogeneous and unreliable grid computing environments. Our g...
Qian Zhu, Gagan Agrawal
EUROMICRO
2009
IEEE
14 years 9 days ago
Fault-Tolerant BPEL Workflow Execution via Cloud-Aware Recovery Policies
BPEL is the de facto standard for business process modeling in today's enterprises and is a promising candidate for the integration of business and scientific applications tha...
Ernst Juhnke, Tim Dörnemann, Bernd Freisleben