Sciweavers

535 search results - page 33 / 107
» Fault tolerant high performance computing by a coding approa...
Sort
View
DSN
2004
IEEE
13 years 11 months ago
Efficient Byzantine-Tolerant Erasure-Coded Storage
This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning enables the protocol...
Garth R. Goodson, Jay J. Wylie, Gregory R. Ganger,...
HPDC
2010
IEEE
13 years 8 months ago
ROARS: a scalable repository for data intensive scientific computing
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide b...
Hoang Bui, Peter Bui, Patrick J. Flynn, Douglas Th...
LCTRTS
2009
Springer
14 years 2 months ago
A compiler optimization to reduce soft errors in register files
Register file (RF) is extremely vulnerable to soft errors, and traditional redundancy based schemes to protect the RF are prohibitive not only because RF is often in the timing c...
Jongeun Lee, Aviral Shrivastava
ICPP
2007
IEEE
14 years 1 months ago
Fault-Driven Re-Scheduling For Improving System-level Fault Resilience
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
Yawei Li, Prashasta Gujrati, Zhiling Lan, Xian-He ...
CASES
2009
ACM
14 years 2 months ago
Towards scalable reliability frameworks for error prone CMPs
As technology scales and the energy of computation continually approaches thermal equilibrium [1,2], parameter variations and noise levels will lead to larger error rates at vario...
Joseph Sloan, Rakesh Kumar