Although logically available, applications may not exploit enough instantaneous communication concurrency to maximize hardware utilization on HPC systems. This is exacerbated in h...
Nicholas Chaimov, Khaled Z. Ibrahim, Samuel Willia...
Discrete GPUs in modern multi-GPU systems can transparently access each other’s memories through the PCIe interconnect. Future systems will improve this capability by including ...
Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose ...
Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushi...
High-performance concurrent priority queues are essential for applications such as task scheduling and discrete event simulation. Unfortunately, even the best performing implement...
Dan Alistarh, Justin Kopinsky, Jerry Li, Nir Shavi...
In the last few years, managed runtime environments such as the Java Virtual Machine (JVM) are increasingly used on large-scale multicore servers. The garbage collector (GC) repre...
Maria Carpen Amarie, Patrick Marlier, Pascal Felbe...
Graph-structured analytics has been widely adopted in a number of big data applications such as social computation, web-search and recommendation systems. Though much prior resear...
Modern GPUs have been widely used to accelerate the graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynch...
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchroniz...