Sciweavers

1652 search results - page 13 / 331
» Performance evaluation of adaptive MPI
Sort
View
HIPC
2007
Springer
14 years 1 months ago
A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
John Paul Walters, Vipin Chaudhary
PVM
2005
Springer
14 years 1 months ago
Cooperative Write-Behind Data Buffering for MPI I/O
Many large-scale production parallel programs often run for a very long time and require data checkpoint periodically to save the state of the computation for program restart and/o...
Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...
IPPS
2007
IEEE
14 years 2 months ago
Nonuniformly Communicating Noncontiguous Data: A Case Study with PETSc and MPI
Due to the complexity associated with developing parallel applications, scientists and engineers rely on highlevel software libraries such as PETSc, ScaLAPACK and PESSL to ease th...
Pavan Balaji, Darius Buntinas, Satish Balay, Barry...
FGCS
2008
140views more  FGCS 2008»
13 years 7 months ago
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols
A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant progr...
Darius Buntinas, Camille Coti, Thomas Hérau...
CLUSTER
2008
IEEE
13 years 9 months ago
Efficient one-copy MPI shared memory communication in Virtual Machines
Efficient intra-node shared memory communication is important for High Performance Computing (HPC), especially with the emergence of multi-core architectures. As clusters continue ...
Wei Huang, Matthew J. Koop, Dhabaleswar K. Panda