Sciweavers

272 search results - page 49 / 55
» Code Transformations to Improve Memory Parallelism
Sort
View
CLUSTER
2007
IEEE
14 years 4 months ago
Balancing productivity and performance on the cell broadband engine
— The Cell Broadband Engine (BE) is a heterogeneous multicore processor, combining a general-purpose POWER architecture core with eight independent single-instructionmultiple-dat...
Sadaf R. Alam, Jeremy S. Meredith, Jeffrey S. Vett...
MICRO
2006
IEEE
135views Hardware» more  MICRO 2006»
14 years 3 months ago
Support for High-Frequency Streaming in CMPs
As the industry moves toward larger-scale chip multiprocessors, the need to parallelize applications grows. High inter-thread communication delays, exacerbated by over-stressed hi...
Ram Rangan, Neil Vachharajani, Adam Stoler, Guilhe...
EUROPAR
2001
Springer
14 years 2 months ago
Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
Sverker Holmgren, Dan Wallin
LCTRTS
2007
Springer
14 years 4 months ago
Tetris: a new register pressure control technique for VLIW processors
The run-time performance of VLIW (very long instruction word) microprocessors depends heavily on the effectiveness of its associated optimizing compiler. Typical VLIW compiler pha...
Weifeng Xu, Russell Tessier
LCTRTS
2005
Springer
14 years 3 months ago
Cache aware optimization of stream programs
Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly preval...
Janis Sermulins, William Thies, Rodric M. Rabbah, ...