Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically accomplished by space slicing, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time slicing is coordinated across processors. Both schemes suffer from fragmentation, where processors are left idle because jobs cannot be packed with perfect efficiency. Obviously, this leads to reduced utilization and sub-optimal performance. Flexible coscheduling (FCS) solves this problem by monitoring each job’s granularity and communication activity, and using gang scheduling only for those jobs that require it. Processes from other jobs, which can be scheduled without any constraints, are used as filler to reduce fragmentation. In addition, inefficiencies due to load imbalance and hardware heterogeneity are also reduced because the classification is done on a per-process basis. FCS has been fully imple...
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Pe