In this paper, we utilize a bandwidth-centric job communication model that captures the interaction and impact of simultaneously co-allocating jobs across multiple clusters. We make use of a parallel job model that seeks to capture both local and global communication access patterns. By doing so, we are able to explore scheduling strategies that attempt to improve average job turnaround time by selectively mapping jobs across cluster boundaries in a process known as job co-allocation. In this research, we focus on scheduling strategies that make use of available information such as network link utilization, per-processor bandwidths, and job communication topology in order to make intelligent decisions regarding application partition sizes and job placement. We provide results that help to establish the relationship between the quantity of information available a priori to the scheduler and its ability to improve overall system performance. Additionally, we demonstrate the dramatic imp...
William M. Jones, Walter B. Ligon III, Nishant Shr