Search Sciweavers | Sciweavers

21

HIPC
2007
Springer

133views Distributed And Parallel Com...» more HIPC 2007»

A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications

14 years 1 months ago

As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...

John Paul Walters, Vipin Chaudhary

claim paper

Read More »

23

click to vote

PVM
2005
Springer

117views Distributed And Parallel Com...» more PVM 2005»

Cooperative Write-Behind Data Buffering for MPI I/O

14 years 1 months ago

Download cucis.ece.northwestern.edu

Many large-scale production parallel programs often run for a very long time and require data checkpoint periodically to save the state of the computation for program restart and/o...

Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...

claim paper

Read More »

25

click to vote

IPPS
2007
IEEE

128views Distributed And Parallel Com...» more IPPS 2007»

Nonuniformly Communicating Noncontiguous Data: A Case Study with PETSc and MPI

14 years 2 months ago

Download www.mcs.anl.gov

Due to the complexity associated with developing parallel applications, scientists and engineers rely on highlevel software libraries such as PETSc, ScaLAPACK and PESSL to ease th...

Pavan Balaji, Darius Buntinas, Satish Balay, Barry...

claim paper

Read More »

17

click to vote

FGCS
2008

140views more FGCS 2008»

Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols

13 years 7 months ago

Download www.public.iastate.edu

A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant progr...

Darius Buntinas, Camille Coti, Thomas Hérau...

claim paper

Read More »

27

click to vote

CLUSTER
2008
IEEE

155views Distributed And Parallel Com...» more CLUSTER 2008»

Efficient one-copy MPI shared memory communication in Virtual Machines

13 years 9 months ago

Download nowlab.cse.ohio-state.edu

Efficient intra-node shared memory communication is important for High Performance Computing (HPC), especially with the emergence of multi-core architectures. As clusters continue ...

Wei Huang, Matthew J. Koop, Dhabaleswar K. Panda

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers