Sciweavers

1256 search results - page 20 / 252
» On Coordinated Checkpointing in Distributed Systems
Sort
View
CCGRID
2006
IEEE
14 years 2 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
PVM
2005
Springer
14 years 2 months ago
New User-Guided and ckpt-Based Checkpointing Libraries for Parallel MPI Applications
We present design and implementation details as well as performance results for two new parallel checkpointing libraries developed by us for parallel MPI applications. The first o...
Pawel Czarnul, Marcin Fraczak
PDCAT
2009
Springer
14 years 3 months ago
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
Abstract—In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not supp...
Hiroyuki Takizawa, Katsuto Sato, Kazuhiko Komatsu,...
EUSFLAT
2003
165views Fuzzy Logic» more  EUSFLAT 2003»
13 years 10 months ago
Genetic fuzzy systems to evolve coordination strategies in competitive distributed systems
This paper suggests an evolutionary approach to design coordination strategies, a key issue in distributed intelligent systems. We focus on competitive strategies in the form of f...
Igor Walter, Fernando A. C. Gomide