Time skewing and loop tiling has been known for a long time to be a highly beneficial acceleration technique for nested loops especially on bandwidth hungry multi-core processors...
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel program...
Mai Zheng, Vignesh T. Ravi, Feng Qin, Gagan Agrawa...
Developing parallel software using current tools can be challenging. Even experts find it difficult to reason about the use of locks and often accidentally introduce race condit...
James Christopher Jenista, Yong Hun Eom, Brian Dem...
We describe two novel constructs for programming parallel machines with multi-level memory hierarchies: call-up, which allows a child task to invoke computation on its parent, and...
Michael Bauer, John Clark, Eric Schkufza, Alex Aik...
The queue data structure is fundamental and ubiquitous. Lockfree versions of the queue are well known. However, an important open question is whether practical wait-free queues ex...
The sparse grid discretization technique enables a compressed representation of higher-dimensional functions. In its original form, it relies heavily on recursion and complex data...
Alin Florindor Murarasu, Josef Weidendorfer, Gerri...
We propose a cooperative methodology for multithreaded software, where threads use traditional synchronization idioms such as locks, but additionally document each point of potent...
Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent ...