Sciweavers

PPOPP
2015
ACM
8 years 6 months ago
Barrier elision for production parallel programs
Large scientific code bases are often composed of several layers of runtime libraries, implemented in multiple programming languages. In such situation, programmers often choose ...
Milind Chabbi, Wim Lavrijsen, Wibe de Jong, Koushi...
PPOPP
2015
ACM
8 years 6 months ago
The SprayList: a scalable relaxed priority queue
High-performance concurrent priority queues are essential for applications such as task scheduling and discrete event simulation. Unfortunately, even the best performing implement...
Dan Alistarh, Justin Kopinsky, Jerry Li, Nir Shavi...
PPOPP
2015
ACM
8 years 6 months ago
A performance study of Java garbage collectors on multicore architectures
In the last few years, managed runtime environments such as the Java Virtual Machine (JVM) are increasingly used on large-scale multicore servers. The garbage collector (GC) repre...
Maria Carpen Amarie, Patrick Marlier, Pascal Felbe...
PPOPP
2015
ACM
8 years 6 months ago
NUMA-aware graph-structured analytics
Graph-structured analytics has been widely adopted in a number of big data applications such as social computation, web-search and recommendation systems. Though much prior resear...
Kaiyuan Zhang, Rong Chen, Haibo Chen
PPOPP
2015
ACM
8 years 6 months ago
Optimization of asynchronous graph processing on GPU with hybrid coloring model
Modern GPUs have been widely used to accelerate the graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynch...
Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He...
PPOPP
2015
ACM
8 years 6 months ago
Stochastic gradient descent on GPUs
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in data-parallel algorithms, synchroniz...
Rashid Kaleem, Sreepathi Pai, Keshav Pingali
PPOPP
2015
ACM
8 years 6 months ago
Supporting multiple accelerators in high-level programming models
Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in workstations, servers and supercomputers for scientific and engineering...
Yonghong Yan 0001, Pei-Hung Lin, Chunhua Liao, Bro...
PPOPP
2015
ACM
8 years 6 months ago
Adaptive GPU cache bypassing
Modern graphics processing units (GPUs) include hardwarecontrolled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are ine...
Yingying Tian, Sooraj Puthoor, Joseph L. Greathous...
PPOPP
2015
ACM
8 years 6 months ago
Predicate RCU: an RCU for scalable concurrent updates
Read-copy update (RCU) is a shared memory synchronization mechanism with scalable synchronization-free reads that nevertheless execute correctly with concurrent updates. To guaran...
Maya Arbel, Adam Morrison