Sciweavers

2226 search results - page 28 / 446
» Fault-Tolerant Parallel Applications with Dynamic Parallel S...
Sort
View
CCGRID
2006
IEEE
14 years 1 months ago
MPI-Mitten: Enabling Migration Technology in MPI
Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents ...
Cong Du, Xian-He Sun
PVM
2010
Springer
13 years 6 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
PPAM
2005
Springer
14 years 1 months ago
A Web Computing Environment for Parallel Algorithms in Java
We present a web computing library (PUBWCL) in Java that allows to execute tightly coupled, massively parallel algorithms in the bulk-synchronous (BSP) style on PCs distributed ove...
Olaf Bonorden, Joachim Gehweiler, Friedhelm Meyer ...
CCGRID
2008
IEEE
14 years 2 months ago
An Autonomic Workflow Management System for Global Grids
Workflow Management System is generally utilized to define, manage and execute workflow applications on Grid resources. However, the increasing scale complexity, heterogeneity and...
Mustafizur Rahman 0003, Rajkumar Buyya
IPPS
2007
IEEE
14 years 2 months ago
DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems
In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....
Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...