Sciweavers

955 search results - page 8 / 191
» Performance optimization of multiple memory architectures fo...
Sort
View
TPDS
2010
144views more  TPDS 2010»
13 years 6 months ago
Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture
—Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscala...
David A. Zier, Ben Lee
ICPP
2003
IEEE
14 years 1 months ago
Procedural Level Address Offset Assignment of DSP Applications with Loops
Automatic optimization of address offset assignment for DSP applications, which reduces the number of address arithmetic instructions to meet the tight memory size restrictions an...
Youtao Zhang, Jun Yang 0002
IPPS
1996
IEEE
13 years 12 months ago
A Method for Register Allocation to Loops in Multiple Register File Architectures
Multiple instruction issue processors place high demands on register file bandwidth. One solution to reduce this bottleneck is the use of multiple register files. Register allocat...
David J. Kolson, Alexandru Nicolau, Nikil D. Dutt,...
ICPP
2009
IEEE
14 years 2 months ago
Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels
—Sparse Matrix-Vector multiplication (SpMV) is a very challenging computational kernel, since its performance depends greatly on both the input matrix and the underlying architec...
Vasileios Karakasis, Georgios I. Goumas, Nectarios...
MICRO
2002
IEEE
173views Hardware» more  MICRO 2002»
14 years 19 days ago
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Multimedia processing on embedded devices requires an architecture that leads to high performance, low power consumption, reduced design complexity, and small code size. In this p...
Christoforos E. Kozyrakis, David A. Patterson