Sciweavers

162 search results - page 13 / 33
» Distributed evaluation functions for fault tolerant multi-ro...
Sort
View
HPCA
2003
IEEE
14 years 8 months ago
Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters
A challenging issue in today's server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address t...
Rosalia Christodoulopoulou, Reza Azimi, Angelos Bi...
HIPC
2007
Springer
14 years 1 months ago
A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
John Paul Walters, Vipin Chaudhary
COMPSAC
1998
IEEE
13 years 12 months ago
Architecture of ROAFTS/Solaris: A Solaris-Based Middleware for Real-Time Object-Oriented Adaptive Fault Tolerance Support
Middleware implementation of various critical services required by large-scale and complex real-time applications on top of COTS operating system is currently an approach of growi...
Eltefaat Shokri, Patrick Crane, K. H. Kim, Chittur...
ICPP
2007
IEEE
14 years 1 months ago
Fault-Driven Re-Scheduling For Improving System-level Fault Resilience
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
Yawei Li, Prashasta Gujrati, Zhiling Lan, Xian-He ...
NSDI
2010
13 years 9 months ago
Prophecy: Using History for High-Throughput Fault Tolerance
Byzantine fault-tolerant (BFT) replication has enjoyed a series of performance improvements, but remains costly due to its replicated work. We eliminate this cost for read-mostly ...
Siddhartha Sen, Wyatt Lloyd, Michael J. Freedman