Current trends suggest future software systems will rely on service-discovery protocols to combine and recombine distributed services dynamically in reaction to changing conditions. We investigate the ability of selected designs for service-discovery protocols to support real-time distributed control applications by detecting and recovering from failure of remote services. We model two architectures (two-party and three-party) underlying most commercial service-discovery systems. We use simulation to quantify functional effectiveness achieved by the two architectures as the rate of failure increases for remote services. We further decompose non-functional periods into failure-detection delay and recovery delay. Our quantitative measurements suggest that a two-party architecture yields better robustness than a three-party architecture. We discuss the underlying causes for this outcome.
Christopher Dabrowski, Kevin L. Mills, Andrew L. R