Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose ...
Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushi...
High-performance concurrent priority queues are essential for applications such as task scheduling and discrete event simulation. Unfortunately, even the best performing implement...
Dan Alistarh, Justin Kopinsky, Jerry Li, Nir Shavi...
In the last few years, managed runtime environments such as the Java Virtual Machine (JVM) are increasingly used on large-scale multicore servers. The garbage collector (GC) repre...
Maria Carpen Amarie, Patrick Marlier, Pascal Felbe...
Graph-structured analytics has been widely adopted in a number of big data applications such as social computation, web-search and recommendation systems. Though much prior resear...
Modern GPUs have been widely used to accelerate the graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynch...
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchroniz...
Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in workstations, servers and supercomputers for scientific and engineering...
Yonghong Yan 0001, Pei-Hung Lin, Chunhua Liao, Bro...
Modern graphics processing units (GPUs) include hardwarecontrolled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are ine...
Yingying Tian, Sooraj Puthoor, Joseph L. Greathous...