BlueGene/L is currently the world’s fastest supercomputer. It consists of a large number of low power dual-processor compute nodes interconnected by high speed torus and collect...
Abstract--Parallel programming is hard, because it is impractical to test all possible thread interleavings. One promising approach to improve a multi-threaded program's relia...
We describe a generic programming model to design collective communications on SMP clusters. The programming model utilizes shared memory for collective communications and overlap...
Embedded systems consisting of the application program ROM, RAM, the embedded processor core and any custom hardware on a single wafer are becoming increasingly common in areas suc...
Various studies have shown that OS jitter can degrade parallel program performance considerably at large processor counts. Most sources of system jitter fall broadly into 5 catego...