We consider scheduling real-time distributable threads in the presence of node/link failures and message losses in large-scale network systems. We present a distributed scheduling algorithm called RTG-L. The algorithm uses gossip-based communication for dynamically and dependably discovering eligible nodes. Traditionally, gossip protocols incur high message overhead. We explain that this problem is not that serious. We present a gossip-based message propagation protocol with lower message overhead. In scheduling local thread sections, RTG-L exploits slacks to optimize gossip time utilization. Thereby, it satisfies end-to-end time constraints with probabilistic assurance. Our simulation studies verify our analytical results.
Kai Han, Binoy Ravindran, E. Douglas Jensen