Quadtree matrices using Morton-order storage provide natural blocking on every level of a memory hierarchy. Writing the natural recursive algorithms to take advantage of this bloc...
To better manage the ever increasing complexity of LAM/MPI, we have created a lightweight component architecture for it that is specifically designed for high-performance message p...
Compiled communication has recently been proposed to improve communication performance for clusters of workstations. The idea of compiled communication is to apply more aggressive...
Collecting a program’s execution profile is important for many reasons: code optimization, memory layout, program debugging and program comprehension. Path based execution pro...