Sciweavers

272 search results - page 40 / 55
» Code Transformations to Improve Memory Parallelism
Sort
View
IPPS
2000
IEEE
14 years 1 months ago
Dynamic Data Layouts for Cache-Conscious Factorization of DFT
Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimizationtechniques for computing ...
Neungsoo Park, Dongsoo Kang, Kiran Bondalapati, Vi...
ICS
1999
Tsinghua U.
14 years 28 days ago
Reducing cache misses using hardware and software page placement
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been...
Timothy Sherwood, Brad Calder, Joel S. Emer
LCTRTS
2009
Springer
14 years 3 months ago
Eliminating the call stack to save RAM
Most programming languages support a call stack in the programming model and also in the runtime system. We show that for applications targeting low-power embedded microcontroller...
Xuejun Yang, Nathan Cooprider, John Regehr
IPPS
2007
IEEE
14 years 3 months ago
Load Miss Prediction - Exploiting Power Performance Trade-offs
— Modern CPUs operate at GHz frequencies, but the latencies of memory accesses are still relatively large, in the order of hundreds of cycles. Deeper cache hierarchies with large...
Konrad Malkowski, Greg M. Link, Padma Raghavan, Ma...
ASAP
2003
IEEE
133views Hardware» more  ASAP 2003»
14 years 1 months ago
Storage Management in Process Networks using the Lexicographically Maximal Preimage
At the Leiden Embedded Research Center, we are developing a compiler called Compaan that automatically translates signal processing applications written in Matlab into Kahn Proces...
Alexandru Turjan, Bart Kienhuis