Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sortin...
Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau,...
Synthesizing architectural requirements from an application viewpoint can help in making important architectural design decisions towards building large scale parallel machines. I...
Anand Sivasubramaniam, Aman Singla, Umakishore Ram...
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory para...
Several recent papers have proposed or analyzed optimal algorithms to route all-to-all personalizedcommunication (AAPC) over communication networks such as meshes, hypercubes and ...