Sciweavers

359 search results - page 16 / 72
» A Framework for Experimental Validation and Performance Eval...
Sort
View
HPCA
2009
IEEE
14 years 7 months ago
Accurate microarchitecture-level fault modeling for studying hardware faults
Decreasing hardware reliability is expected to impede the exploitation of increasing integration projected by Moore's Law. There is much ongoing research on efficient fault t...
Man-Lap Li, Pradeep Ramachandran, Ulya R. Karpuzcu...
SOSP
2003
ACM
14 years 3 months ago
Separating agreement from execution for byzantine fault tolerant services
We describe a new architecture for Byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests. This se...
Jian Yin, Jean-Philippe Martin, Arun Venkataramani...
SRDS
2005
IEEE
14 years 9 days ago
Agile Store: Experience with Quorum-Based Data Replication Techniques for Adaptive Byzantine Fault Tolerance
Quorum protocols offer several benefits when used to maintain replicated data but techniques for reducing overheads associated with them have not been explored in detail. It is d...
Lei Kong, Deepak J. Manohar, Mustaque Ahamad, Arun...
CLUSTER
2004
IEEE
13 years 10 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
ICPP
2007
IEEE
14 years 1 months ago
Fault-Driven Re-Scheduling For Improving System-level Fault Resilience
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
Yawei Li, Prashasta Gujrati, Zhiling Lan, Xian-He ...