Sciweavers

86 search results - page 7 / 18
» Hybrid checkpointing for parallel applications in cluster fe...
Sort
View
IPPS
2005
IEEE
14 years 1 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
CLUSTER
2004
IEEE
13 years 11 months ago
FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI
As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...
Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé
ICDCS
1997
IEEE
13 years 11 months ago
Supporting Dynamic Space-sharing on Clusters of Non-dedicated Workstations
Clusters of workstations are increasingly being viewed as a cost-e ective alternative to parallel supercomputers. However, resource management and scheduling on workstations clust...
Abdur Chowdhury, Lisa D. Nicklas, Sanjeev Setia, E...
ICS
1999
Tsinghua U.
13 years 11 months ago
Fast cluster failover using virtual memory-mapped communication
This paper proposes a novel way to use virtual memorymapped communication (VMMC) to reduce the failover time on clusters. With the VMMC model, applications’ virtual address spac...
Yuanyuan Zhou, Peter M. Chen, Kai Li
CLUSTER
2006
IEEE
13 years 11 months ago
Lightweight I/O for Scientific Applications
Today's high-end massively parallel processing (MPP) machines have thousands to tens of thousands of processors, with next-generation systems planned to have in excess of one...
Ron Oldfield, Lee Ward, Rolf Riesen, Arthur B. Mac...