Sciweavers

55 search results - page 8 / 11
» Tolerating Radiation-Induced Transient Faults in Modern Proc...
Sort
View
HPCA
2009
IEEE
14 years 8 months ago
Versatile prediction and fast estimation of Architectural Vulnerability Factor from processor performance metrics
The shrinking processor feature size, lower threshold voltage and increasing clock frequency make modern processors highly vulnerable to transient faults. Architectural Vulnerabil...
Lide Duan, Bin Li, Lu Peng
DSN
2002
IEEE
14 years 13 days ago
Ditto Processor
Concentration of design effort for current single-chip Commercial-Off-The-Shelf (COTS) microprocessors has been directed towards performance. Reliability has not been the primary ...
Shih-Chang Lai, Shih-Lien Lu, Jih-Kwon Peir
CCGRID
2003
IEEE
14 years 23 days ago
Improved Read Performance in a Cost-Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS)
Due to the ever-widening performance gap between processors and disks, I/O operations tend to become the major performance bottleneck of data-intensive applications on modern clus...
Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David ...
ISCA
2002
IEEE
115views Hardware» more  ISCA 2002»
14 years 12 days ago
SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery
We develop an availability solution, called SafetyNet, that uses a unified, lightweight checkpoint/recovery mechanism to support multiple long-latency fault detection schemes. At...
Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, ...
CCGRID
2006
IEEE
14 years 1 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra