Several recent papers have proposed or analyzed optimal algorithms to route all-to-all personalizedcommunication (AAPC) over communication networks such as meshes, hypercubes and ...
Parallel independent disks can enhance the performance of external memory (EM) algorithms, but the programming task is often di cult. In this paper we develop randomized variants ...
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory para...
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose
a standard ray tracing algorithm into several data-parallel stages tha...
Cache-oblivious algorithms have the advantage of achieving good sequential cache complexity across all levels of a multi-level cache hierarchy, regardless of the specifics (cache...
Guy E. Blelloch, Phillip B. Gibbons, Harsha Vardha...