Sciweavers

CCGRID
2006
IEEE

A Failure-Aware Scheduling Strategy in Large-Scale Cluster System

14 years 5 months ago
A Failure-Aware Scheduling Strategy in Large-Scale Cluster System
As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling takes charge with high efficient resource management and reasonable job scheduling. The function of job scheduling in cluster is divided into two sub-parts: job selection and node allocation. In this paper, we introduce a failure-aware scheduling strategy named LUNF (Longest Uptime Node First) node allocation policy using characterization of nodes' failure. Simulation results show that LUNF policy do better than random node allocation policy for the system performance.
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib
Added 10 Jun 2010
Updated 10 Jun 2010
Type Conference
Year 2006
Where CCGRID
Authors Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bibo Tu
Comments (0)