The performance of parallel computing systems is strongly dependent on the runtime behaviour of parallel programs. This paper describes a new approach to measure and analyze the ru...
We present a parallel code generation algorithm for complete applications and a new experimental methodology that tests the efficacy of our approach. The algorithm optimizes for d...
UPC’s implicit communication and fine-grain programming style make application performance modeling a challenging task. The correspondence between remote references and communi...
Thread migration/checkpointing is becoming indispensable for load balancing and fault tolerance in high performance computing applications, and its success depends on the migration...
This paper introduces an analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutati...