Clusters of workstations are becoming popular platforms for parallel computing, but performance on these systems is more complex and harder to predict than on traditional parallel machines. Hence, performance monitoring and analysis is important for understanding application behavior and improving performance. We present a performance monitor for HPVM, a high-performance cluster running Windows NT. The novel features of our monitor are: an integrated approach to performance information, a (software) global clock to correlate performance information across cluster nodes and leverage of Windows NT performance monitoring facilities. We discuss the design issues for this tool, and present results of using this tool to analyze the performance of a cluster application.
Geetanjali Sampemane, Scott Pakin, Andrew A. Chien