– In this paper we present a new approach to benchmark the performance of shared memory systems. This approach focuses on recognizing how far off the performance of a given memor...
In this paper we study the problems of sorting and selection on the Distributed Memory Bus Computer (DMBC) recently introduced by Sahni. In particular we present: 1) An efficient a...
— Large parallel computers require techniques to tolerate the potentially large latencies of accessing remote data. Multithreadingis onesuch technique. We extend previous studies...
Henk L. Muller, Paul W. A. Stallard, David H. D. W...
— Skewed-associative caches use several hash functions to reduce collisions in caches without increasing the associativity. This technique can increase the hit ratio of a cache w...
Henk L. Muller, Paul W. A. Stallard, David H. D. W...
Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-...
– In this paper we present experiments with a class of graph partitioning algorithms that reduce the size of the graph by collapsing vertices and edges, partition the smaller gra...
Data-parallel primitives for performing operations on the PM1 quadtree and the bucket PMR quadtree are presented using the scan model. Algorithms are described for building these ...
If G is a connected graph with N nodes, its r dimensional product contains Nr nodes. We present an algorithm which sorts Nr keys stored in the rdimensional product of any graph G ...
Software barriers have been designed and evaluated for barrier synchronization in large-scale shared-memory multiprocessors, under the assumption that all processorsreach the sync...