Sciweavers

115 search results - page 9 / 23
» Transparent Fault Tolerance for Parallel Applications on Net...
Sort
View
HPDC
2009
IEEE
14 years 2 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
HPCC
2007
Springer
13 years 11 months ago
FTSCP: An Efficient Distributed Fault-Tolerant Service Composition Protocol for MANETs
Abstract. Service composition, which enables users to construct complex services from atomic services, is an essential feature for the usability of Mobile Ad hoc Networks (MANETs)....
Zhen-guo Gao, Sheng Liu, Ming Ji, Jinhua Zhao, Lih...
ICDCS
2007
IEEE
14 years 2 months ago
Fault Tolerance in Multiprocessor Systems Via Application Cloning
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes
GPC
2007
Springer
14 years 1 months ago
Fault Management in P2P-MPI
We present in this paper the recent developments done in P2P-MPI, a grid middleware, concerning the fault management, which covers fault-tolerance for applications and fault detect...
Stéphane Genaud, Choopan Rattanapoka
IPPS
2002
IEEE
14 years 18 days ago
Fault-Tolerance in the Network Storage Stack
This paper addresses the issue of fault-tolerance in applications that make use of network storage. A network abstraction called the Network Storage Stack is presented, along with...
Scott Atchley, Stephen Soltesz, James S. Plank, Mi...