reason with abstracted models of the behaviours they use to construct plans. When plans are turned into the instructions that drive an executive, the real behaviours interacting w...
Enterprise systems must guarantee high availability and reliability to provide 24/7 services without interruptions and failures. Mechanisms for handling exceptional cases and impl...
Distributed storage systems often use data replication to mask failures and guarantee high data availability. Node failures can be transient or permanent. While the system must ge...
Jing Tian, Zhi Yang, Wei Chen, Ben Y. Zhao, Yafei ...
What do our computer systems do all day? How do we make sure they continue doing it when failures occur? Traditional approaches to answering these questions often involve inband m...
Dan Pelleg, Muli Ben-Yehuda, Richard Harper, Lisa ...
The one issue that unites almost all approaches to distributed computing is the need to know whether certain components in the system have failed or are otherwise unavailable. Whe...