Secure, fault-tolerant distributed systems are difficult to build, to validate, and to operate. Conservative design for such systems dictates that their security and fault toleran...
MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers,...
Shengkai Zhu, Zhiwei Xiao, Haibo Chen, Rong Chen, ...
ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from performance models with low level design details to identi...
As technology scales and the energy of computation continually approaches thermal equilibrium [1,2], parameter variations and noise levels will lead to larger error rates at vario...
Modern scientific experiments can generate large amounts of data, which may be replicated and distributed across multiple resources to improve application performance and fault to...