Abstract. In this paper, we propose a universal network, called recursive dual-net (RDN). It can be used as a candidate of effective interconnection networks for massively parallel...
Abstract. This paper presents a simulator for of a decentralized modular grid scheduler named MaGate. MaGate’s design emphasizes scheduler interoperability by providing intellige...
Abstract. In many-core CMP architectures, the cache coherence protocol is a key component since it can add requirements of area and power consumption to the final design and, there...
We present the algorithm to multiply univariate polynomials with integer coefficients efficiently using the Number Theoretic transform (NTT) on Graphics Processing Units (GPU). The...
MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers,...
Shengkai Zhu, Zhiwei Xiao, Haibo Chen, Rong Chen, ...
In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance...
Christoph Bosshard, Roland Bouffanais, Christian C...
SIMD extension is one of the most common and effective technique to exploit data-level parallelism in today’s processor designs. However, the performance of SIMD architectures i...
Abstract. With more cores integrated into one single chip, the overall power consumption from the multiple concurrent running programs increases dramatically in a CMP processor whi...