This paper introduces a method for improving program run-time performance by gathering work in an application and executing it efficiently in an integrated thread. Our methods ext...
Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input,...
Joseph Devietti, Brandon Lucia, Luis Ceze, Mark Os...
In this paper, we present a methodology for profiling parallel applications executing on the IBM PowerXCell 8i (commonly referred to as the “Cell” processor). Specifically, we...
Hikmet Dursun, Kevin J. Barker, Darren J. Kerbyson...
Fast networks have made it possible to coordinate distributed heterogeneous CPU, memory, and storage resources to provide a powerful platform for executing high-performance applic...
SCALASCA is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large-scale systems. It offers an incremental performan...
Markus Geimer, Felix Wolf, Brian J. N. Wylie, Erik...