This paper presents a new superscalar architecture for fast discrete cosine transform (DCT). Comparing with the general SIMD architecture, it speeds up the DCT computation by a fac...
All-to-all personalized exchange is one of the most dense collective communication patterns and occurs in many important parallel computing/networking applications. In this paper,...
Time skewing is a compile-time optimization that can provide arbitrarily high cache hit rates for a class of iterative calculations, given a sufficient number of time steps and s...
HeteroSort load balances and sorts within static or dynamic networks using a conceptual torus mesh. We ported HeteroSort to a 16-node Beowulf cluster with a central switch architec...
Pamela Yang, Timothy M. Kunau, Bonnie Holte Bennet...
Trends in parallel computing indicate that heterogeneous parallel computing will be one of the most widespread platforms for computation-intensive applications. A heterogeneous com...