Most research in the area of publish/subscribe systems has not considered fault-tolerance as a central design issues. However, faults do obviously occur and masking all faults is at least expensive if not impossible. A potential alternative (or sensible supplementation) to fault masking is self-stabilization which allows a system to recover from arbitrary transient faults such as memory perturbations, communication errors, and process crashes with subsequent recoveries. In this paper we discuss how publish/subscribe systems can be made selfstabilizing by using self-stabilizing content-based routing. When the time between consecutive faults is long enough, corrupted parts of the routing tables are removed, while correct parts are refreshed in time, and missing parts are inserted. To judge the efficiency of self-stabilizing content-based routing, we compare it to flooding, which is the na¨ıve implementation of a self-stabilizing publish/subscribe system. We show that our approach is s...
Gero Mühl, Michael A. Jaeger, Klaus Herrmann,