— In this paper, we present a design for a generic, open, application-oriented performance instrumentation of multitier applications. Measurements are performed through configur...
Markus Schmid, Marcus Thoss, Thomas Termin, Reinho...
VLIW machines possibly provide the most direct way to exploit instruction level parallelism; however, they cannot be used to emulate current general-purpose instruction set archit...
This work is devoted to the numerical resolution of the 4D Vlasov equation using an adaptive mesh of phase space. We previously proposed a parallel algorithm designed for distribut...
Existing supercomputers have hundreds of thousands of processor cores, and future systems may have hundreds of millions. Developers need detailed performance measurements to tune ...
Todd Gamblin, Bronis R. de Supinski, Martin Schulz...
STL dictionaries like map and set are commonly used in C++ programs. We consider parallelizing two of their bulk operations, namely the construction from many elements, and the ins...