Abstract—This paper describes a method to implement faulttolerant services in distributed systems based on the idea of fused state machines. The theory of fused state machines us...
We build on PTIDES, a programming model for distributed embedded systems that uses discrete-event (DE) models as program specifications. PTIDES improves on distributed DE executi...
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
In this paper, we present a new online failure forecast system to achieve predictive failure management for fault-tolerant data stream processing. Different from previous reactive ...
Xiaohui Gu, Spiros Papadimitriou, Philip S. Yu, Sh...
Over the last 2-3 years, the importance of data-intensive computing has increasingly been recognized, closely coupled with the emergence and popularity of map-reduce for developin...