Operator mistakes have been identified as a significant source of unavailability in Internet services. In this paper, we propose a new language, A , for service engineers to write assertions about expected behaviors, proper configurations, and proper structural characteristics. This formalized specification of correct behavior can be used to bolster system understanding, as well as help to flag operator mistakes in a distributed system. Operator mistakes can be caused by anything from static misconfiguration to physical placement of wires and machines. This language, along with its associated runtime system, seeks to be flexible and robust enough to deal with the wide array of operator mistakes while maintaining a simple interface for designers or programmers.
Andrew Tjang, Fábio Oliveira, Richard P. Ma