Sciweavers

212 search results - page 19 / 43
» Supporting fault tolerance in a data-intensive computing mid...
Sort
View
ICDCS
2007
IEEE
14 years 1 months ago
Fault Tolerance in Multiprocessor Systems Via Application Cloning
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes
ICDCS
1998
IEEE
13 years 12 months ago
Low-Overhead Protocols for Fault-Tolerant File Sharing
In this paper, we quantify the adverse effect of file sharing on the performance of reliable distributed applications. We demonstrate that file sharing incurs significant overhead...
Lorenzo Alvisi, Sriram Rao, Harrick M. Vin
NSDI
2010
13 years 9 months ago
Prophecy: Using History for High-Throughput Fault Tolerance
Byzantine fault-tolerant (BFT) replication has enjoyed a series of performance improvements, but remains costly due to its replicated work. We eliminate this cost for read-mostly ...
Siddhartha Sen, Wyatt Lloyd, Michael J. Freedman
ICS
2007
Tsinghua U.
14 years 1 months ago
Proactive fault tolerance for HPC with Xen virtualization
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...
HASE
1997
IEEE
13 years 11 months ago
High-Coverage Fault Tolerance in Real-Time Systems Based on Point-to-Point Communication
: The distributed recovery block (DRB) scheme is a widely applicable approach for realizing both hardware and software fault tolerance in real-time distributed and parallel compute...
K. H. Kim, Chittur Subbaraman, Eltefaat Shokri