Sciweavers

3886 search results - page 13 / 778
» Implementing Fault-Tolerant Distributed Applications
Sort
View
CCGRID
2008
IEEE
13 years 7 months ago
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed
IPPS
2010
IEEE
13 years 5 months ago
Supporting fault tolerance in a data-intensive computing middleware
Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developin...
Tekin Bicer, Wei Jiang, Gagan Agrawal
DEXAW
2004
IEEE
132views Database» more  DEXAW 2004»
13 years 11 months ago
Using Data-Flow Analysis for Resilience and Result Checking in Peer-To-Peer Computations
To achieve correct execution of peer-to-peer applications on non-reliable resources, we present a portable and distributed algorithm that provides fault tolerance and result checki...
Samir Jafar, Sébastien Varrette, Jean-Louis...
HPCA
2003
IEEE
14 years 7 months ago
Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters
A challenging issue in today's server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address t...
Rosalia Christodoulopoulou, Reza Azimi, Angelos Bi...
CLUSTER
2004
IEEE
13 years 11 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...