Sciweavers

778 search results - page 93 / 156
» Efficient Code Generation for Automatic Parallelization and ...
Sort
View
CLUSTER
2003
IEEE
14 years 2 months ago
Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost
The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. Thi...
Surendra Byna, William D. Gropp, Xian-He Sun, Raje...
ICVS
2001
Springer
14 years 1 months ago
Compiling SA-C Programs to FPGAs: Performance Results
Abstract. At the first ICVS, we presented SA-C (“sassy”), a singleassignment variant of the C programming language designed to exploit both coarse-grain and fine-grain parallel...
Bruce A. Draper, A. P. Wim Böhm, Jeffrey Hamm...
DATE
2009
IEEE
155views Hardware» more  DATE 2009»
14 years 1 months ago
Automatically mapping applications to a self-reconfiguring platform
The inherent reconfigurability of SRAM-based FPGAs enables the use of configurations optimized for the problem at hand. Optimized configurations are smaller and faster than their g...
Karel Bruneel, Fatma Abouelella, Dirk Stroobandt
VLSISP
2002
103views more  VLSISP 2002»
13 years 9 months ago
A New Class of Efficient Block-Iterative Interference Cancellation Techniques for Digital Communication Receivers
A new and efficient class of nonlinear receivers is introduced for digital communication systems. These "iterated-decision" receivers use optimized multipass algorithms t...
Albert M. Chan, Gregory W. Wornell
EUROPAR
2006
Springer
14 years 1 months ago
Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences
Abstract. This paper presents a study of performance optimization of dense matrix multiplication on IBM Cyclops-64(C64) chip architecture. Although much has been published on how t...
Ziang Hu, Juan del Cuvillo, Weirong Zhu, Guang R. ...