A central problem in extending the von Neumann architecture to petaflop computers with millions of hardware threads and with a shared memory is defining the memory model [Lam79,...
Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applic...
Despite the large research efforts in the SW–DSM community, this technology has not yet been adapted widely for significant codes beyond benchmark suites. One of the reasons co...
The increasing numbers of cores, shared caches and memory nodes within machines introduces a complex hardware topology. High-performance computing applications now have to carefull...
Much research has been done in fast communication on clusters and in protocols for supporting software shared memory across them. However, the end performance of applications that...