Sciweavers

PPOPP
2012
ACM
12 years 7 months ago
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Speedups demonstrated for finding the biconnected components of a graph: 9x to 33x on the Explicit Multi-Threading (XMT) many-core computing platform relative to the best serial ...
James A. Edwards, Uzi Vishkin
PPOPP
2012
ACM
12 years 7 months ago
PARRAY: a unifying array representation for heterogeneous parallelism
This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU c...
Yifeng Chen, Xiang Cui, Hong Mei
PPOPP
2012
ACM
12 years 7 months ago
Mechanizing the expert dense linear algebra developer
The efforts of an expert to parallelize and optimize a dense linear algebra algorithm for distributed-memory targets are largely mechanical and repetitive. We demonstrate that the...
Bryan Marker, Andy Terrel, Jack Poulson, Don S. Ba...
PPOPP
2012
ACM
12 years 7 months ago
DOJ: dynamically parallelizing object-oriented programs
We present Dynamic Out-of-Order Java (DOJ), a dynamic parallelization approach. In DOJ, a developer annotates code blocks as tasks to decouple these blocks from the parent executi...
Yong Hun Eom, Stephen Yang, James Christopher Jeni...
PPOPP
2012
ACM
12 years 7 months ago
Internally deterministic parallel algorithms can be fast
The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one ...
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gib...
PPOPP
2012
ACM
12 years 7 months ago
Deterministic parallel random-number generation for dynamic-multithreading platforms
Existing concurrency platforms for dynamic multithreading do not provide repeatable parallel random-number generators. This paper proposes that a mechanism called pedigrees be bui...
Charles E. Leiserson, Tao B. Schardl, Jim Sukha
PPOPP
2012
ACM
12 years 7 months ago
CPHASH: a cache-partitioned hash table
CPHASH is a concurrent hash table for multicore processors. CPHASH partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partit...
Zviad Metreveli, Nickolai Zeldovich, M. Frans Kaas...
PPOPP
2012
ACM
12 years 7 months ago
Concurrent breakpoints
In program debugging, reproducibility of bugs is a key requirement. Unfortunately, bugs in concurrent programs are notoriously difficult to reproduce because bugs due to concurre...
Chang-Seo Park, Koushik Sen
PPOPP
2012
ACM
12 years 7 months ago
A methodology for creating fast wait-free data structures
Lock-freedom is a progress guarantee that ensures overall program progress. Wait-freedom is a stronger progress guarantee that ensures the progress of each thread in the program. ...
Alex Kogan, Erez Petrank
PPOPP
2012
ACM
12 years 7 months ago
Chestnut: a GPU programming language for non-experts
Graphics processing units (GPUs) are powerful devices capable of rapid parallel computation. GPU programming, however, can be quite difficult, limiting its use to experienced prog...
Andrew Stromme, Ryan Carlson, Tia Newhall