We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal s...
YongChul Kwon, Magdalena Balazinska, Albert G. Gre...
This work addresses the issue of design optimization for faulttolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpoin...
Petru Eles, Viacheslav Izosimov, Paul Pop, Zebo Pe...
In distributed systems that use active replication to achieve robustness, it is important to efficiently enforce consistency among replicas. The nonblocking mode helps to speed u...
Today most Internet services are pre-assigned to servers statically, hence preventing us from doing real-time sharing of a pool of servers across as group of services with dynamic...