The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal computers, workstations and interconnect technologies. In particular, the issue of reliable communication is addressed by introducing redundancy in the form of multiple network interfaces per compute node. When using compute nodes with multiple network connections the question of how to determine connectivity between nodes arises. We examine a connectivity protocol that guarantees that each side of a point-to-point connection sees the same history of activity over the communication channel. In other words, we maintain a consistent history of the state of the communication channel. At any give moment in time the histories as seen by each side are guaranteed to be identical to within some number of transitions. This bound on how much one side may lead or lag the other is the slack. Our main contributions are: (i) a si...
Paul S. LeMahieu, Jehoshua Bruck