Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
A comprehensive study of the whole petabyte-scale archival data of astronomical observatories has a possibility of new science and new knowledge in the field, while it was not fe...
Performance Trees are a unifying framework for the specification of performance queries involving measures and requirements. This paper describes an evaluation environment for Pe...
Darren K. Brien, Nicholas J. Dingle, William J. Kn...
The use of a cluster for distributed performance analysis of parallel trace data is discussed. We propose an analysis architecture that uses multiple cluster nodes as a server to ...