With the rapid replacement of closed, homogeneous, proprietary HPC systems by heterogeneous, Linux-MPI cluster systems, the state of performance monitoring and analysis tools has become a cause for concern. Proprietary systems, despite their drawbacks, provided consistent tools of high quality. Modern Linux cluster systems, on the other hand, benefit from a wide variety of Open Source tools in differing stages of evolution. Recognizing that Linux clusters are here to stay, SiCortex has taken a unique approach of integrating and enhancing Open Source tools into a production-quality suite. Further, as a tribute to the unrewarded Open Source community developers, and for more pragmatic reasons, such as long-term sustainability, changes made to the tools are fed upstream to the original tool developers. In this paper, we present an overview of the SiCortex tools' suite, and some of the challenges and successes we had in the process of realizing it.
Philip J. Mucci, Tushar Mohan