We present in this paper an extension of the messagedriven confidence-driven framework that we developed for onboard guarded software upgrading. The purpose of this work is to provide the framework with the capability of protecting distributed software upgrades that involve messagepassing interface changes. To achieve this goal, we propose an approach to clustering the components involved in software upgrades and those involved in message-passing interface changes, such that from outside the cluster all those components can be perceived collectively as one virtual low-confidence component. Moreover, we develop a confidence-driven mechanism that enables combined use of sender- and receiver-side message logging for efficient, fine-grained error containment and recovery. The paper provides a detailed algorithm description.
Ann T. Tai, Kam S. Tso, William H. Sanders