Sciweavers

339 search results - page 7 / 68
» Modeling Faults of Distributed, Reactive Systems
Sort
View
HPCA
2009
IEEE
14 years 8 months ago
Accurate microarchitecture-level fault modeling for studying hardware faults
Decreasing hardware reliability is expected to impede the exploitation of increasing integration projected by Moore's Law. There is much ongoing research on efficient fault t...
Man-Lap Li, Pradeep Ramachandran, Ulya R. Karpuzcu...
ICPADS
1998
IEEE
13 years 11 months ago
The XBW Model for Dependable Real-Time Systems
This paper presents a new conceptual model, the XBWModel. Distributed computing is becoming a cost effective way to implement safety critical control systems. To support the devel...
Vilgot Claesson, Stefan Poledna, Jan Söderber...
ICDCS
2012
IEEE
11 years 9 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
IANDC
2007
152views more  IANDC 2007»
13 years 7 months ago
The reactive simulatability (RSIM) framework for asynchronous systems
We define reactive simulatability for general asynchronous systems. Roughly, simulatability means that a real system implements an ideal system (specification) in a way that pre...
Michael Backes, Birgit Pfitzmann, Michael Waidner
ICS
2007
Tsinghua U.
14 years 1 months ago
Proactive fault tolerance for HPC with Xen virtualization
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...