As supercomputers are being built from an ever increasing number of processing elements, the effort required to achieve a substantial fraction of the system peak performance is continuously growing. Tools are needed that give developers and computing center staff holistic indicators about the resource consumption of applications and potential performance pitfalls at scale. To use the full potential of a supercomputer today, applications must incorporate multilevel parallelism (threading and message passing) and carefully orchestrate file I/O. As a consequence, performance tools must also be able to monitor these system components in an integrated way and at the full machine scales. We present IPM, a modularized monitoring approach for MPI, OpenMP, file I/O, and other event sources. We describe its implementation design principles, which are targeted for efficiency and minimal application perturbation, and present an application study of using IPM at scale.
Karl Fürlinger, Nicholas J. Wright, David Ski