Clusters of workstations are increasingly being viewed as a cost-e ective alternative to parallel supercomputers. However, resource management and scheduling on workstations clusters is complicated by the fact that the number of idle workstations available for executing parallel applications is constantly uctuating. In this paper, we present a case for scheduling parallel applications on non-dedicated workstation clusters using dynamic space-sharing, a policy under which the number of processors allocated to an application can be changed during its execution. We describe an approach that uses application-level checkpointing and data repartitioning for supporting dynamic spacesharing and for handling the dynamic recon guration triggered when failure or owner activity is detected on a workstation being used by a parallel application. The performance advantages of dynamic space-sharing are quanti ed through a simulation study, and experimental results are presented for the overhead of dy...
Abdur Chowdhury, Lisa D. Nicklas, Sanjeev Setia, E