Search Sciweavers | Sciweavers

442 search results - page 63 / 89

» Fault Tolerant Wide-Area Parallel Computing

164

click to vote

MIDDLEWARE
2009
Springer

139views Distributed And Parallel Com...» more MIDDLEWARE 2009»

Why Do Upgrades Fail and What Can We Do about It?

16 years 20 days ago

Download www.ece.cmu.edu

Abstract. Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading ca...

Tudor Dumitras, Priya Narasimhan

claim paper

Read More »

239

click to vote

HCW
2000
IEEE

170views Distributed And Parallel Com...» more HCW 2000»

Evaluation of PAMS' Adaptive Management Services

15 years 10 months ago

Download acl.ece.arizona.edu

Management of large-scale parallel and distributed applications is an extremely complex task due to factors such as centralized management architectures, lack of coordination and ...

Yoonhee Kim, Salim Hariri, Muhamad Djunaedi

claim paper

Read More »

200

click to vote

HPDC
2009
IEEE

101views Distributed And Parallel Com...» more HPDC 2009»

Interconnect agnostic checkpoint/restart in open MPI

16 years 26 days ago

Download www.osl.iu.edu

Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...

Joshua Hursey, Timothy Mattox, Andrew Lumsdaine

claim paper

Read More »

154

click to vote

IPPS
2007
IEEE

111views Distributed And Parallel Com...» more IPPS 2007»

Achieving Reliable Parallel Performance in a VoD Storage Server Using Randomization and Replication

16 years 12 days ago

Download cobweb.ecn.purdue.edu

This paper investigates randomization and replication as strategies to achieve reliable performance in disk arrays targeted for video-on-demand (VoD) workloads. A disk array can p...

Yung Ryn Choe, Vijay S. Pai

claim paper

Read More »

184

click to vote

HPDC
2007
IEEE

129views Distributed And Parallel Com...» more HPDC 2007»

Failure-aware checkpointing in fine-grained cycle sharing systems

16 years 13 days ago

Download www.ecn.purdue.edu

Fine-Grained Cycle Sharing (FGCS) systems aim at utilizing the large amount of idle computational resources available on the Internet. Such systems allow guest jobs to run on a ho...

Xiaojuan Ren, Rudolf Eigenmann, Saurabh Bagchi

claim paper

Read More »

« Prev « First page 63 / 89 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers