Gang Scheduling and related techniques are widely believed to be necessary for efficientjob scheduling on distributed memory parallel computers. This is hecause they minimize contextswitching overheadsand permit the parallel job currently running to progress at thefastest possible rate. Howevec in the case of cluster computers, and panicularly those with COTS networks, these benefits can be outweighed in the multiplejob time-sharing context by the loss the ability to utilize the CPUforotherjobs when the current ;oh is waitingfor messages. Experimentson a L i n u Beowulfcluster with IOOMbfast Ethernet switches are mnde comparing the Score buddybased gang scheduling with local scheduling (pmvided by the L i n u 2.4 kernel with MPI implemented over TCP/IP). Resultsfor communication-intensivenumerical applications on 16 nodes reveal that gang scheduling results in 'slowdowns'up to afactor of twogreaterfor 8 simultaneousjobs. Thisphenomenon is not due to anydeficiencies in Score ...
Peter E. Strazdins, John Uhlmann