Supercomputer performance is highly dependent on its interconnection subsystem design. In this paper we study how di erent architectural approaches for router design impact into s...
Abstract. Program transformations are one of the most valuable compiler techniques to improve data locality. However, restructuring compilers have a hard time coping with data depe...
We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a fo...
Sergei Gorlatch, Christoph Wedler, Christian Lenga...
We present a new pointer analysis for use in shared memory programs running on hierarchical parallel machines. The analysis is motivated by the partitioned global address space lan...
Thread escape analysis conservatively determines which objects may be accessed in more than one thread. Thread escape analysis is useful for a variety of purposes – finding rac...