The amount of physical variation among electronic components on a die is increasing rapidly. There is a need for a better understanding of variations in transient fault susceptibility, and for methods of on-line adaptation to such variations. We address three key research questions in this area. First, we investigate accelerated characterization of individual latch susceptibilities. We find that on the order of 10 upsets per latch must be observed for variations to be adequately characterized. Second, we propose a method of on-line hardware reconfiguration using incremental place-and-route on FPGAs. Surprisingly, we find that highly localized place-and-route changes (e.g. restricted to groups of 8 flip-flops) are sufficient for realizing most of the possible benefits. Lastly, we quantify potential improvements in system-level soft error rates via Monte Carlo simulation experiments. The study highlights both what is required for and what can be gained by on-line adaptation.
Kenneth M. Zick, John P. Hayes