Abstract. In this paper we focus on design of a class of distributed embedded systems that primarily perform real-time controlling tasks. We propose a two-layer component model for...
The soaring demands for always-on and fast-response online services have driven modern datacenter networks to undergo tremendous growth. These networks often rely on scale-out des...
Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Ma...
As the scale of high-performance computing (HPC) continues to grow, failure resilience of parallel applications becomes crucial. In this paper, we present FT-Pro, an adaptive fault...
— Designing and implementing artificial self-organizing systems is a challenging task since they typically behave nonintuitive and no theoretical foundations exist. Predicting a...
This paper concerns the validity of a widely used method for estimating the architecture-level mean time to failure (MTTF) due to soft errors. The method first calculates the fai...
Xiaodong Li, Sarita V. Adve, Pradip Bose, Jude A. ...