Search Sciweavers | Sciweavers

The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...

Yawei Li, Prashasta Gujrati, Zhiling Lan, Xian-He ...

claim paper

Read More »

197

click to vote

ICPPW
2009
IEEE

132views Distributed And Parallel Com...» more ICPPW 2009»

Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System

15 years 5 months ago

Download www.mcs.anl.gov

Current petascale systems have tens of thousands of hardware components and complex system software stacks, which increase the probability of faults occurring during the lifetime ...

Harish Gapanati Naik, Rinku Gupta, Pete Beckman

claim paper

Read More »

214

click to vote

CCGRID
2006
IEEE

131views Distributed And Parallel Com...» more CCGRID 2006»

Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation

16 years 1 months ago

Download icl.cs.utk.edu

With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...

Yuan Tang, Graham E. Fagg, Jack Dongarra

claim paper

Read More »

182

click to vote

IPPS
2008
IEEE

127views Distributed And Parallel Com...» more IPPS 2008»

Large-scale experiment of co-allocation strategies for Peer-to-Peer supercomputing in P2P-MPI

16 years 1 months ago

Download hal.inria.fr

High Performance computing generally involves some parallel applications to be deployed on the multiples resources used for the computation. The problem of scheduling the applicat...

Stéphane Genaud, Choopan Rattanapoka

claim paper

Read More »

« Prev « First page 27 / 446 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers