The size of supercomputers in numbers of processors is growing exponentially. Today’s largest supercomputers have upwards of a hundred thousand processors and tomorrow’s may ha...
Mustafa M. Tikir, Michael Laurenzano, Laura Carrin...
Despite advances in the application of automated statistical and machine learning techniques to system log and trace data there will always be a need for human analysis of machine...
We describe a methodology that enables the real-time diagnosis of performance problems in complex high-performance distributed systems. The methodology includes tools for generati...
Brian Tierney, William E. Johnston, Brian Crowley,...
Abstract-- The development of high performance parallel applications for clusters is considered a complex task. This can happen because the influence of the execution environment a...
Lucas Mello Schnorr, Philippe Olivier Alexandre Na...
We present a new software technology for on-line performance analysis and visualization of complex parallel and distributed systems. Often heterogeneous, these systems need capabi...
Aleksandar M. Bakic, Matt W. Mutka, Diane T. Rover