Sciweavers

186 search results - page 15 / 38
» Real-Time Distributed Discrete-Event Execution with Fault To...
Sort
View
CCGRID
2010
IEEE
13 years 8 months ago
Selective Recovery from Failures in a Task Parallel Programming Model
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
CCGRID
2008
IEEE
13 years 9 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin
EUROPAR
2008
Springer
13 years 9 months ago
Fault-Tolerant Partial Replication in Large-Scale Database Systems
We investigate a decentralised approach to committing transactions in a replicated database, under partial replication. Previous protocols either reexecute transactions entirely an...
Pierre Sutra, Marc Shapiro
ICPP
1987
IEEE
13 years 11 months ago
A Software-Based Hardware Fault Tolerance Scheme for Multicomputers
-- A hardware fault tolerance scheme for large multicomputers executing time-consuming non-interactive applications is described. Error detection and recovery are done mostly by so...
Yuval Tamir, Eli Gafni
ISCA
2011
IEEE
270views Hardware» more  ISCA 2011»
12 years 11 months ago
Sampling + DMR: practical and low-overhead permanent fault detection
With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting an...
Shuou Nomura, Matthew D. Sinclair, Chen-Han Ho, Ve...