For modern x86 based CPUs with increasingly longer vector lengths, achieving good vectorization has become very important for gaining higher performance. Using very explicit SIMD ...
In Computed Tomography (CT) methods, Model Based Iterative Reconstruction (MBIR) produces higher quality images than commonly used Filtered Backprojection (FBP) but at a very high...
Xiao Wang, Amit Sabne, Sherman J. Kisner, Anand Ra...
Declarative programming has been hailed as a promising approach to parallel programming since it makes it easier to reason about programs while hiding the implementation details o...
To achieve good multi-core performance, modern microprocessors have weak memory models, rather than enforce sequential consistency. This gives the programmer a wide scope for choo...
The MCS lock is one of the most prevalent queuing locks. It provides fair scheduling and high performance on massively parallel systems. However, the MCS lock mandates a bring-you...
Precise dynamic race detectors report an error if and only if more than one thread concurrently exhibits conflict on a memory access. They insert instrumentations at compiletime ...
High memory contention is generally agreed to be a worst-case scenario for concurrent data structures. There has been a significant amount of research effort spent investigating ...
Syed Kamran Haider, William Hasenplaugh, Dan Alist...
We introduce OPTIK, a new practical design pattern for designing and implementing fast and scalable concurrent data structures. OPTIK relies on the commonly-used technique of vers...
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming la...
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, Chri...