This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU c...
We present the Buffer Heap (BH), a cache-oblivious priority queue that supports Delete-Min, Delete, and Decrease-Key operations in O( 1 B log2 N B ) amortized block transfers fro...
In this paper we describe our experience with Teapot [7], a domain-specific language for writing cache coherence protocols. Cache coherence is of concern when parallel and distrib...
Satish Chandra, James R. Larus, Michael Dahlin, Br...
We consider the computation of shortest paths on Graphic Processing Units (GPUs). The blocked recursive elimination strategy we use is applicable to a class of algorithms (such as...
It has long been recognized that multi-stream operators, such as union and join, often have to wait idly in a temporarily blocked state, as a result of skews between the timestamp...
Yijian Bai, Hetal Thakkar, Haixun Wang, Carlo Zani...