Atomic Broadcast (where all processes deliver broadcast messages in the same order) is a very useful group communication primitive for building fault-tolerant distributed systems. This paper presents an atomic broadcast protocol that can be claimed to be optimal in terms of failure detection, resilience, and latency. The protocol requires only the weakest of the useful failure detectors for liveness, and permits upto (n-1)/2 processes to crash in a system of n processes; at most two communication steps and n broadcasts are needed in a run during which process crashes and failure-suspicions do not occur. We also introduce the notion of Notifying Broadcast which can reduce the message overhead further in ’nice’ runs in which all processes are operational and communication delays do not exceed the bound assumed. If nice runs persist, the average message overhead is just one broadcast. That is, the protocol extracts no message overhead for providing crash-tolerance if process failures...
Paul D. Ezhilchelvan, Doug Palmer, Michel Raynal