GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance...
The fth release of the multithreaded language Cilk uses a provably good \work-stealing" scheduling algorithm similar to the rst system, but the language has been completely r...
Matteo Frigo, Charles E. Leiserson, Keith H. Randa...
—Using the well-known ATLAS and LAPACK dense linear algebra libraries, we demonstrate that the parallel management overhead (PMO) can grow with problem size on even statically sc...
The paper introduces Self-Replicating Objects (SROs), a new nt programming abstraction. An SRO is implemented and used much like an ordinary .NET object and can expose arbitrary us...
Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of ...