Sciweavers

244 search results - page 13 / 49
» Optimizing Loop Performance for Clustered VLIW Architectures
Sort
View
EMSOFT
2004
Springer
14 years 1 months ago
An approach for integrating basic retiming and software pipelining
Basic retiming is an algorithm originally developed for hardware optimization. Software pipelining is a technique proposed to increase instruction-level parallelism for parallel p...
Noureddine Chabini, Wayne Wolf
MICRO
2005
IEEE
170views Hardware» more  MICRO 2005»
14 years 1 months ago
The TM3270 Media-Processor
We present the TM3270 media-processor, the latest TriMedia VLIW processor, tuned to address the performance demands of standard definition video processing, combined with embedded...
Jan-Willem van de Waerdt, Stamatis Vassiliadis, Sa...
IEEECIT
2010
IEEE
13 years 6 months ago
SAT: A Stream Architecture Template for Embedded Applications
- The increase of embedded applications complexity has demanded hardware more flexible while providing higher performance. Reconfigurable architectures and stream processing have b...
Qianming Yang, Nan Wu, Mei Wen, Yi He, Huayou Su, ...
SAMOS
2004
Springer
14 years 1 months ago
with Wide Functional Units
— Architectural resources and program recurrences are the main limitations to the amount of Instruction-Level Parallelism (ILP) exploitable from loops, the most time-consuming pa...
Miquel Pericàs, Eduard Ayguadé, Javi...
CLUSTER
2004
IEEE
13 years 11 months ago
Predicting memory-access cost based on data-access patterns
Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...