This paper introduces the CondorJ2 cluster management system. Traditionally, cluster management systems such as Condor employ a process-oriented approach with little or no use of modern database system technology. In contrast, CondorJ2 employs a data-centric, 3-tier web-application architecture for all system functions (e.g., job submission, monitoring and scheduling; node configuration, monitoring and management, etc.) except for job execution. Employing a data-oriented approach allows the core challenge (i.e., managing and coordinating a large set of distributed computing resources) to be transformed from a relatively low-level systems problem into a tract, higher-level data management problem. Preliminary results suggest that CondorJ2’s use of standard 3-tier software represents a significant step forward to the design and implementation of large clusters (1,000 to 10,000 nodes).
Eric Robinson, David J. DeWitt