Sciweavers

441 search results - page 37 / 89
» Generic Timing Fault Tolerance using a Timely Computing Base
Sort
View
HIPC
2009
Springer
13 years 5 months ago
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...
ICTAC
2005
Springer
14 years 1 months ago
Revisiting Failure Detection and Consensus in Omission Failure Environments
It has recently been shown that fair exchange, a security problem in distributed systems, can be reduced to a fault tolerance problem, namely a special form of distributed consensu...
Carole Delporte-Gallet, Hugues Fauconnier, Felix C...
SC
2009
ACM
14 years 2 months ago
Kestrel: an XMPP-based framework for many task computing applications
This paper presents a new distributed computing framework for Many Task Computing (MTC) applications, based on the Extensible Messaging and Presence Protocol (XMPP). A lightweight...
Lance Stout, Michael A. Murphy, Sebastien Goasguen
SAC
2006
ACM
13 years 7 months ago
Combining supervised and unsupervised monitoring for fault detection in distributed computing systems
Fast and accurate fault detection is becoming an essential component of management software for mission critical systems. A good fault detector makes possible to initiate repair a...
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Ke...
ATS
2000
IEEE
145views Hardware» more  ATS 2000»
14 years 3 days ago
Compaction-based test generation using state and fault information
We present a new test generation procedure for sequential circuits using newly traversed state and newly detected fault information obtained between successive iterations of vecto...
Ashish Giani, Shuo Sheng, Michael S. Hsiao, Vishwa...