In this paper, the effect of switch design on the application performance of cache-coherent non-uniform memory access (CC-NUMA) multiprocessors is studied in detail. Wormhole rout...
Laxmi N. Bhuyan, Hu-Jun Wang, Ravi R. Iyer, Akhile...
This paper presents the RTExpressTM environment which is a software tool that assists a user in rapidly developing real-time embedded systems. RTExpressTM is a compiler and runtim...
Milissa M. Benincasa, Richard Besler, Diane Brassa...
Several methods have been proposed in the literature for the local enumeration of dense references for arrays distributed by the CYCLIC(k) data-distributionin High Performance For...
Gerardo Bandera, Pablo P. Trabado, Emilio L. Zapat...
Recent developments in networking technology cause a growing interest in connecting local-area clusters of workstations over wide-area links, creating multilevel clusters, or meta...
Henri E. Bal, Aske Plaat, Mirjam G. Bakker, Peter ...
This paper describes a new parallel algorithm for Minimum Cost Path computation on the Polymorphic Processor Array, a massively parallel architecture based on a reconfigurable mesh...
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split ...
Fully-populated tori, where every node has a processor attached, do not scale well since load on edges increases superlinearly with network size under heavy communication, resulti...