Sciweavers

33 search results - page 3 / 7
» Improving the Fault Tolerance of a Computer System with Spac...
Sort
View
ICDCS
2012
IEEE
11 years 9 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
EDCC
2008
Springer
13 years 9 months ago
A Transient-Resilient System-on-a-Chip Architecture with Support for On-Chip and Off-Chip TMR
The ongoing technological advances in the semiconductor industry make Multi-Processor System-on-a-Chips (MPSoCs) more attractive, because uniprocessor solutions do not scale satis...
Roman Obermaisser, Hubert Kraut, Christian El Sall...
EDCC
2006
Springer
13 years 11 months ago
SEU Mitigation Techniques for Microprocessor Control Logic
The importance of fault tolerance at the processor architecture level has been made increasingly important due to rapid advancements in the design and usage of high performance de...
T. S. Ganesh, Viswanathan Subramanian, Arun K. Som...
IPPS
1999
IEEE
13 years 11 months ago
A Dynamic Fault-Tolerant Mesh Architecture
A desired mesh architecture, based on connected-cycle modules, is constructed. To enhance the reliability, multiple bus sets and spare nodes are dynamically inserted to construct m...
Jyh-Ming Huang, Ted C. Yang
HPCC
2009
Springer
13 years 5 months ago
Reliability Optimization of Reconfigurable Computing-Based Fault-Tolerant System
Domain-partition (DP) model is a general model for reliability maximization problem under given redundancy. In this paper, an improved DP model is used to formulate a reconfigurati...
Mi Zhou, Lihong Shang, Yu Hu