Sciweavers

PPOPP
2015
ACM
8 years 3 months ago
Predicate RCU: an RCU for scalable concurrent updates
Read-copy update (RCU) is a shared memory synchronization mechanism with scalable synchronization-free reads that nevertheless execute correctly with concurrent updates. To guaran...
Maya Arbel, Adam Morrison
PPOPP
2015
ACM
8 years 3 months ago
A collection-oriented programming model for performance portability
This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operator...
Saurav Muralidharan, Michael Garland, Bryan C. Cat...
PPOPP
2015
ACM
8 years 3 months ago
Optimization for performance and energy for batched matrix computations on GPUs
As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size indepe...
Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stani...
PPOPP
2015
ACM
8 years 3 months ago
Effects of source-code optimizations on GPU performance and energy consumption
This paper studies the effects of source-code optimizations on the performance, power draw, and energy consumption of a modern compute GPU. We evaluate 128 versions of two n-body ...
Jared Coplin, Martin Burtscher
PPOPP
2015
ACM
8 years 3 months ago
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent
In this paper, we present the most extensive comparison of synchronization techniques. We evaluate 5 different synchronization techniques through a series of 31 data structure alg...
Vincent Gramoli
PPOPP
2015
ACM
8 years 3 months ago
SYNC or ASYNC: time to fuse for distributed graph-parallel computation
Large-scale graph-structured computation usually exhibits iterative and convergence-oriented computing nature, where input data is computed iteratively until a convergence conditi...
Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang,...
PPOPP
2015
ACM
8 years 3 months ago
A library for portable and composable data locality optimizations for NUMA systems
Many recent multiprocessor systems are realized with a nonuniform memory architecture (NUMA) and accesses to remote memory locations take more time than local memory accesses. Opt...
Zoltan Majo, Thomas R. Gross
PPOPP
2015
ACM
8 years 3 months ago
Automatic scalable atomicity via semantic locking
In this paper, we consider concurrent programs in which the shared nsists of instances of linearizable ADTs (abstract data types). We present an automated approach to concurrency ...
Guy Golan-Gueta, G. Ramalingam, Mooly Sagiv, Eran ...
PPOPP
2015
ACM
8 years 3 months ago
RaftLib: a C++ template library for high performance stream parallel processing
Stream processing or data-flow programming is a compute paradigm that has been around for decades in many forms yet has failed garner the same attention as other mainstream langu...
Jonathan C. Beard, Peng Li, Roger D. Chamberlain