Sciweavers

3886 search results - page 13 / 778
» Implementing Fault-Tolerant Distributed Applications
Sort
View
CCGRID
2008
IEEE
15 years 4 months ago
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed
143
Voted
IPPS
2010
IEEE
15 years 2 months ago
Supporting fault tolerance in a data-intensive computing middleware
Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developin...
Tekin Bicer, Wei Jiang, Gagan Agrawal
DEXAW
2004
IEEE
132views Database» more  DEXAW 2004»
15 years 7 months ago
Using Data-Flow Analysis for Resilience and Result Checking in Peer-To-Peer Computations
To achieve correct execution of peer-to-peer applications on non-reliable resources, we present a portable and distributed algorithm that provides fault tolerance and result checki...
Samir Jafar, Sébastien Varrette, Jean-Louis...
HPCA
2003
IEEE
16 years 4 months ago
Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters
A challenging issue in today's server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address t...
Rosalia Christodoulopoulou, Reza Azimi, Angelos Bi...
144
Voted
CLUSTER
2004
IEEE
15 years 8 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...