PPOPP 2010 | Sciweavers

48

PPOPP
2010
ACM

353views Distributed and Parallel Com...» more PPOPP 2010»

Data transformations enabling loop vectorization on multithreaded data parallel architectures

14 years 8 months ago

Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...

Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...

claim paper

Read More »

32

click to vote

PPOPP
2010
ACM

199views Distributed and Parallel Com...» more PPOPP 2010»

Symbolic prefetching in transactional distributed shared memory

14 years 8 months ago

Download demsky.eecs.uci.edu

We present a static analysis for the automatic generation of symbolic prefetches in a transactional distributed shared memory. A symbolic prefetch specifies the first object to be...

Alokika Dash, Brian Demsky

claim paper

Read More »

56

click to vote

PPOPP
2010
ACM

285views Distributed and Parallel Com...» more PPOPP 2010»

A distributed placement service for graph-structured and tree-structured data

14 years 8 months ago

Download www.cse.ohio-state.edu

Effective data placement strategies can enhance the performance of data-intensive applications implemented on high end computing clusters. Such strategies can have a significant i...

Gregory Buehrer, Srinivasan Parthasarathy, Shirish...

claim paper

Read More »

43

click to vote

PPOPP
2010
ACM

167views Distributed and Parallel Com...» more PPOPP 2010»

Using data structure knowledge for efficient lock generation and strong atomicity

14 years 8 months ago

Download cobweb.ecn.purdue.edu

To achieve high-performance on multicore systems, sharedmemory parallel languages must efficiently implement atomic operations. The commonly used and studied paradigms for atomici...

Gautam Upadhyaya, Samuel P. Midkiff, Vijay S. Pai

claim paper

Read More »

33

click to vote

PPOPP
2010
ACM

166views Distributed and Parallel Com...» more PPOPP 2010»

Continuous speculative program parallelization in software

14 years 8 months ago

Download www.cs.rochester.edu

This paper addresses the problem of extracting coarse-grained parallelism from large sequential code. It builds on BOP, a system for software speculative parallelization. BOP lets...

Chao Zhang, Chen Ding, Xiaoming Gu, Kirk Kelsey, T...

claim paper

Read More »

23

click to vote

PPOPP
2010
ACM

154views Distributed and Parallel Com...» more PPOPP 2010»

Fast tridiagonal solvers on the GPU

14 years 8 months ago

Download jcohen.name

We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (...

Yao Zhang, Jonathan Cohen, John D. Owens

claim paper

Read More »

31

click to vote

PPOPP
2010
ACM

284views Distributed and Parallel Com...» more PPOPP 2010»

A practical concurrent binary search tree

14 years 8 months ago

Download ppl.stanford.edu

We propose a concurrent relaxed balance AVL tree algorithm that is fast, scales well, and tolerates contention. It is based on optimistic techniques adapted from software transact...

Nathan Grasso Bronson, Jared Casper, Hassan Chafi,...

claim paper

Read More »

33

click to vote

PPOPP
2010
ACM

196views Distributed and Parallel Com...» more PPOPP 2010»

The LOFAR correlator: implementation and performance analysis

14 years 8 months ago

Download www.cs.vu.nl

LOFAR is the first of a new generation of radio telescopes. Rather than using expensive dishes, it forms a distributed sensor network that combines the signals from many thousands...

John W. Romein, P. Chris Broekema, Jan David Mol, ...

claim paper

Read More »

34

click to vote

PPOPP
2010
ACM

171views Distributed and Parallel Com...» more PPOPP 2010»

Featherweight X10: a core calculus for async-finish parallelism

14 years 8 months ago

Download www.cs.ucla.edu

We present a core calculus with two of X10's key constructs for parallelism, namely async and finish. Our calculus forms a convenient basis for type systems and static analys...

Jonathan K. Lee, Jens Palsberg

claim paper

Read More »

44

click to vote

PPOPP
2010
ACM

232views Distributed and Parallel Com...» more PPOPP 2010»

Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

14 years 9 months ago

Download www.cs.wm.edu

Most modern Chip Multiprocessors (CMP) feature shared cache on chip. For multithreaded applications, the sharing reduces communication latency among co-running threads, but also r...

Eddy Z. Zhang, Xipeng Shen, Yunlian Jiang

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers