In a Computational Grid which consists of many computer clusters, job start time predictions are useful to guide resource selections and balance the workload distribution. However, the basic Grid middleware available today either has no means of expressing the time that a site will take before starting a job or uses a simple linear scale. In this paper, we introduce a system for predicting job start times on clusters. Our predictions are based on statistical analysis of historical job traces and simulation of site schedulers. We have deployed the system on the EDG (European DataGrid) production cluster at NIKHEF. The experimental results show that acceptable prediction accuracy is achieved to reflect real site states and site-specific scheduling policies. We find that the average error of our job start time predictions is 18.9 percent of the average job queue wait time and this is around 20 times smaller than the average prediction error using the EDG solution.
Hui Li, David L. Groep, Jeffrey Templon, Lex Wolte