This paper describes GulfStream, a scalable distributed software system designed to address the problem of managing the network topology in a multi-domain server farm. In particular, it addresses the following core problems: topology discovery and verification, and failure detection. Unlike most topology discovery and failure detection systems which focus on the nodes in a cluster, GulfStream logically organizes the network adapters of the server farm into groups. Each group contains those adapters that can directly exchange messages. GulfStream dynamically establishes a hierarchy for reporting network topology and availability of network adapters. We describe a prototype implementation of GulfStream on a 55 node heterogeneous server farm interconnected using switched fast Ethernet.
Sameh A. Fakhouri, Germán S. Goldszmidt, Mi