Dense linear algebra codes are often expressed and coded in terms of BLAS calls. This approach, however, achieves suboptimal performance due to the overheads associated to such cal...
Today’s scalable high-performance applications heavily depend on the bandwidth characteristics of their communication patterns. Contemporary multi-stage interconnection networks...
This paper introduces two novel architectures for parallel decimal multipliers. Our multipliers are based on a new algorithm for decimal carry–save multioperand addition that us...
We study the performance of high-speed interconnects using a set of communication micro-benchmarks. The goal is to identify certain limiting factors and bottlenecks with these int...
Rod Fatoohi, Ken Kardys, Sumy Koshy, Soundarya Siv...
Abstract. A large number of MPI implementations are currently available, each of which emphasize different aspects of high-performance computing or are intended to solve a speciï¬...
Richard L. Graham, Timothy S. Woodall, Jeffrey M. ...