Search Sciweavers | Sciweavers

81 search results - page 15 / 17

» Challenging the Mean Time to Failure: Measuring Dependabilit...

click to vote

CCGRID
2006
IEEE

131views Distributed And Parallel Com...» more CCGRID 2006»

Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation

14 years 1 months ago

Download icl.cs.utk.edu

With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...

Yuan Tang, Graham E. Fagg, Jack Dongarra

claim paper

Read More »

click to vote

ICPP
2009
IEEE

185views Distributed And Parallel Com...» more ICPP 2009»

Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems

14 years 2 months ago

Download nowlab.cse.ohio-state.edu

—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...

Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...

claim paper

Read More »

click to vote

CLUSTER
2004
IEEE

140views Distributed And Parallel Com...» more CLUSTER 2004»

FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI

13 years 11 months ago

Download charm.cs.uiuc.edu

As high performance clusters continue to grow in size, the mean time between failure shrinks. Thus, the issues of fault tolerance and reliability are becoming one of the challengi...

Gengbin Zheng, Lixia Shi, Laxmikant V. Kalé

claim paper

Read More »

click to vote

INFOCOM
2005
IEEE

138views Communications» more INFOCOM 2005»

Topology aware overlay networks

14 years 1 months ago

Download www.eecs.umich.edu

— Recently, overlay networks have emerged as a means to enhance end-to-end application performance and availability. Overlay networks attempt to leverage the inherent redundancy ...

Junghee Han, David Watson, Farnam Jahanian

claim paper

Read More »

click to vote

MAGS
2010

97views more MAGS 2010»

Towards reliable multi-agent systems: An adaptive replication mechanism

13 years 6 months ago

Download pagesperso-systeme.lip6.fr

Abstract. Distributed cooperative applications (e.g., e-commerce) are now increasingly being designed as a set of autonomous entities, named agents, which interact and coordinate (...

Zahia Guessoum, Jean-Pierre Briot, Nora Faci, Oliv...

claim paper

Read More »

« Prev « First page 15 / 17 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers