The Merrimac supercomputer uses stream processors and a highradix network to achieve high performance at low cost and low power. The stream architecture matches the capabilities o...
Mattan Erez, Jung Ho Ahn, Ankit Garg, William J. D...
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
The problem of writing high performance parallel applications becomes even more challenging when irregular, sparse or adaptive methods are employed. In this paper we introduce com...
It is widely known that parallel operation execution in multiprocessor systems generates a respective increase in memory accesses. Since the memory and bus subsystems provide a li...
Grigoris Dimitroulakos, Michalis D. Galanis, Costa...
Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applic...