This paper evaluates the use of per-node multi-threading to hide remote memory and synchronization latencies in a software DSM. As with hardware systems, multi-threading in softwa...
Modern Graphic Processing Units (GPUs) provide sufficiently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manyco...
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, He...
This paper describes PARDIS, a system containing explicit support for interoperability of PARallel DIStributed applications. PARDIS is based on the Common Object Request Broker Ar...
Some of the most challenging applications to parallelize scalably are the ones that present a relatively small amount of computation per iteration. Multiple interacting performance...
With the right techniques, multicore architectures may be able to continue the exponential performance trend that elevated the performance of applications of all types for decades...
Arun Raman, Hanjun Kim, Thomas R. Mason, Thomas B....