In order to achieve high performance, contemporary microprocessors must effectively process the four major instruction types: ALU, branch, load, and store instructions. This paper...
Bryan Black, Brian Mueller, Stephanie Postal, Ryan...
This paper introduces an analysis technique, commutativity analysis, for automatically parallelizing computations that manipulate dynamic, pointer-based data structures. Commutati...
Tuning the performance of applications requires understanding the interactions between code and target architecture. This paper describes a performance modeling approach that not ...
In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance...
Christoph Bosshard, Roland Bouffanais, Christian C...
The H.264 video codec provides exceptional video compression while imposing dramatic increases in computational complexity over previous standards. While exploiting parallelism in...
Michael A. Baker, Pravin Dalale, Karam S. Chatha, ...