We focus on automatically diagnosing different performance problems in parallel file systems by identifying, gathering and analyzing OS-level, black-box performance metrics on eve...
Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi, Priya...
The Data-Flow Graph (DFG) of a parallel application is frequently used to take scheduling decisions, based on the information that it models (dependencies among the tasks and volu...
Rafael Ennes Silva, Guilherme P. Pezzi, Nicolas Ma...
Shared memory in a parallel computer provides prowith the valuable abstraction of a shared address space--through which any part of a computation can access any datum. Although un...
Targeted optimization of program segments can provide an additional program speedup over the highest default optimization level, such as -O3 in GCC. The key challenge is how to au...
Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Ying...
Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power ...
Wei-Yu Chen, Dan Bonachea, Costin Iancu, Katherine...