Search Sciweavers | Sciweavers

12 search results - page 1 / 3

» Fault tolerant MapReduce-MPI for HPC clusters

214

click to vote

CLUSTER
2011
IEEE

216views Distributed And Parallel Com...» more CLUSTER 2011»

Dynamic Load Balance for Optimized Message Logging in Fault Tolerant HPC Applications

14 years 5 months ago

Download charm.cs.illinois.edu

—Computing systems will grow signiﬁcantly larger in the near future to satisfy the needs of computational scientists in areas like climate modeling, biophysics and cosmology. S...

Esteban Meneses, Laxmikant V. Kalé, Greg Br...

claim paper

Read More »

182

click to vote

ICS
2007
Tsinghua U.

167views Distributed And Parallel Com...» more ICS 2007»

Proactive fault tolerance for HPC with Xen virtualization

16 years 9 hour ago

Download www.csm.ornl.gov

Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...

Arun Babu Nagarajan, Frank Mueller, Christian Enge...

claim paper

Read More »

198

click to vote

HPDC
2009
IEEE

101views Distributed And Parallel Com...» more HPDC 2009»

Interconnect agnostic checkpoint/restart in open MPI

16 years 18 days ago

Download www.osl.iu.edu

Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...

Joshua Hursey, Timothy Mattox, Andrew Lumsdaine

claim paper

Read More »

149

click to vote

ISPA
2004
Springer

146views Distributed And Parallel Com...» more ISPA 2004»

Highly Reliable Linux HPC Clusters: Self-Awareness Approach

15 years 11 months ago

Download xcr.cenit.latech.edu

Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...

Chokchai Leangsuksun, Tong Liu, Yudan Liu, Stephen...

claim paper

Read More »

179

click to vote

IJHPCA
2006

117views more IJHPCA 2006»

MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI

15 years 5 months ago

Download www.cs.utk.edu

Abstract-- High performance computing platforms like Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message...

Aurelien Bouteiller, Thomas Hérault, G&eacu...

claim paper

Read More »

« Prev « First page 1 / 3 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers