Sciweavers

241 search results - page 11 / 49
» Advanced Loop Optimizations for Parallel Computers
Sort
View
IPPS
1996
IEEE
13 years 11 months ago
A Method for Register Allocation to Loops in Multiple Register File Architectures
Multiple instruction issue processors place high demands on register file bandwidth. One solution to reduce this bottleneck is the use of multiple register files. Register allocat...
David J. Kolson, Alexandru Nicolau, Nikil D. Dutt,...
ICPP
2002
IEEE
14 years 9 days ago
Optimal Code Size Reduction for Software-Pipelined Loops on DSP Applications
Code size expansion of software-pipelined loops is a critical problem for DSP systems with strict code size constraint. Some ad-hoc code size reduction techniques were used to try...
Qingfeng Zhuge, Zili Shao, Edwin Hsing-Mean Sha
PPOPP
2005
ACM
14 years 28 days ago
Performance modeling and optimization of parallel out-of-core tensor contractions
The Tensor Contraction Engine (TCE) is a domain-specific compiler for implementing complex tensor contraction expressions arising in quantum chemistry applications modeling elect...
Xiaoyang Gao, Swarup Kumar Sahoo, Chi-Chung Lam, J...
ICPPW
2008
IEEE
14 years 1 months ago
Performance Analysis and Optimization of Parallel Scientific Applications on CMP Cluster Systems
Chip multiprocessors (CMP) are widely used for high performance computing. Further, these CMPs are being configured in a hierarchical manner to compose a node in a cluster system....
Xingfu Wu, Valerie E. Taylor, Charles W. Lively, S...
ICPP
1996
IEEE
13 years 11 months ago
Restructuring Programs for High-Speed Computers with Polaris
The ability to automatically parallelize standard programming languages results in program portability across a wide range of machine architectures. It is the goal of the Polaris ...
William Blume, Rudolf Eigenmann, Keith Faigin, Joh...