Sciweavers

186 search results - page 22 / 38
» Real-Time Distributed Discrete-Event Execution with Fault To...
Sort
View
HPCC
2010
Springer
13 years 7 months ago
A Generic Execution Management Framework for Scientific Applications
Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolera...
Tanvire Elahi, Cameron Kiddle, Rob Simmonds
IPPS
2007
IEEE
14 years 1 months ago
Implementing and Evaluating Automatic Checkpointing
As the size and popularity of computer clusters go on growing, fault tolerance is becoming a crucial factor to ensure high performance and reliability for applications. To provide...
Antonio S. Martins, Ronaldo Augusto Lara Gon&ccedi...
SRDS
2007
IEEE
14 years 1 months ago
The Fail-Heterogeneous Architectural Model
Fault tolerant distributed protocols typically utilize a homogeneous fault model, either fail-crash or fail-Byzantine, where all processors are assumed to fail in the same manner....
Marco Serafini, Neeraj Suri
IPPS
1998
IEEE
13 years 11 months ago
A Generalized Forward Recovery Checkpointing Scheme
We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple v...
Ke Huang, Jie Wu, Eduardo B. Fernández
CODES
2007
IEEE
14 years 1 months ago
Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered embedded systems
In this paper we present an approach to the scheduling and voltage scaling of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded sys...
Paul Pop, Kåre Harbo Poulsen, Viacheslav Izo...