Abstract-- With the continuous downscaling of CMOS technologies, the reliability has become a major bottleneck in the evolution of the next generation systems. Technology trends such as transistor sizing, use of new materials, and system on chip architectures continue to increase the sensitivity of a system to soft errors. These errors are random and not related to permanent hardware faults. Their causes may be internal (e.g., interconnect coupling) or external (e.g., cosmic radiation). To meet the system reliability requirements it is necessary for both the circuit designers and test engineers to get the basic knowledge of the soft errors. We present a tutorial study of the single event upset phenomenon, which is a major cause of soft errors. We summarize the concepts of basic radiation mechanisms and the resulting soft error in silicon. A soft error mitigation technique with time and space redundancy is illustrated. An industrial design example, the IBM z990 system, shows how the ind...
Fan Wang, Vishwani D. Agrawal