Sciweavers

535 search results - page 17 / 107
» Fault tolerant high performance computing by a coding approa...
Sort
View
ASPLOS
2006
ACM
14 years 1 months ago
Dependable != unaffordable
This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of relia...
Alan L. Cox, Kartik Mohanram, Scott Rixner
LCPC
2000
Springer
13 years 11 months ago
SmartApps: An Application Centric Approach to High Performance Computing
State-of-the-art run-time systems are a poor match to diverse, dynamic distributed applications because they are designed to provide support to a wide variety of applications, with...
Lawrence Rauchwerger, Nancy M. Amato, Josep Torrel...
CLUSTER
2004
IEEE
13 years 11 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
ISPA
2004
Springer
14 years 29 days ago
Highly Reliable Linux HPC Clusters: Self-Awareness Approach
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
Chokchai Leangsuksun, Tong Liu, Yudan Liu, Stephen...
SASO
2007
IEEE
14 years 1 months ago
e-SAFE: An Extensible, Secure and Fault Tolerant Storage System
With the rapidly falling price of hardware, and increasingly available bandwidth, the storage technology is seeing a paradigm shift from centralized and managed mode to distribute...
Sandip Agarwala, Arnab Paul, Umakishore Ramachandr...