In this paper a model for scheduling in wide-area systems is described. The model is federated and utilizes a collection of local site schedulers that control the use of their resources. The wide-area scheduler consults the local site schedulers to obtain candidate machine schedules. A set of issues and challenges inherent to wide-area scheduling are also described and the proposed model is shown to address many of these problems. A distributed algorithm for wide-area scheduling is presented and relies upon information made available about the resource needs of user jobs. The wide-area scheduler will be implemented in Legion, a wide-area computing system developed at the University of Virginia.
Jon B. Weissman, Andrew S. Grimshaw