Performance management of clusters and Grids poses many challenges. Sharing large distributed sets of resources can provide efficiencies, but it also introduces complexity in terms of providing and maintaining adequate performance. Current application requirements focus on the amount of resources needed without explicitly characterizing the performance required from those resources. In clusters and Grids, inconsistent or highly variable application run-time is an indication of systemic inconsistency, with ramifications for those running the application and those managing the resources. We are focusing on the contribution of the interconnection network to application run-time variability. This work presents experimental results characterizing parallel application run-time sensitivity to communication performance variability using an Application Communication Emulator (ACE).
Jeffrey J. Evans, Cynthia S. Hood