This paper presents the design and evaluation of an adaptive, system sensitive partitioning and load balancing framework for distributed structured adaptive mesh refinement applic...
Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune ...
Todd Gamblin, Bronis R. de Supinski, Martin Schulz...
Efficient memory allocation and data transfer for cluster-based data-intensive applications is a difficult task. Both changes in cluster interconnects and application workloads ...
We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing...
Automatic tuning has emerged as a solution to provide high-performance libraries for fast changing, increasingly complex computer architectures. We distinguish offline adaptation (...