Sciweavers

955 search results - page 26 / 191
» Performance optimization of multiple memory architectures fo...
Sort
View
PARA
2004
Springer
14 years 2 months ago
A Family of High-Performance Matrix Multiplication Algorithms
During the last half-decade, a number of research efforts have centered around developing software for generating automatically tuned matrix multiplication kernels. These include ...
John A. Gunnels, Fred G. Gustavson, Greg Henry, Ro...
IPPS
2000
IEEE
14 years 1 months ago
Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity ON , where 2  3. We show that such an algorithm can be parallelize...
Keqin Li
MEMOCODE
2010
IEEE
13 years 7 months ago
Feldspar: A domain specific language for digital signal processing algorithms
A new language, Feldspar, is presented, enabling high-level and platform-independent description of digital signal processing (DSP) algorithms. Feldspar is a pure functional langua...
Emil Axelsson, Koen Claessen, Gergely Dévai...
DATE
2002
IEEE
83views Hardware» more  DATE 2002»
14 years 2 months ago
Memory System Connectivity Exploration
In programmable embedded systems, the memory subsystem represents a major cost, performance and power bottleneck. To optimize the system for such different goals, the designer wou...
Peter Grun, Nikil D. Dutt, Alexandru Nicolau
HPCA
2008
IEEE
14 years 9 months ago
Performance and power optimization through data compression in Network-on-Chip architectures
The trend towards integrating multiple cores on the same die has accentuated the need for larger on-chip caches. Such large caches are constructed as a multitude of smaller cache ...
Reetuparna Das, Asit K. Mishra, Chrysostomos Nicop...