Sciweavers

459 search results - page 51 / 92
» Using Kernel Couplings to Predict Parallel Application Perfo...
Sort
View
IEEEPACT
2002
IEEE
15 years 7 months ago
Optimizing Loop Performance for Clustered VLIW Architectures
Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-perfo...
Yi Qian, Steve Carr, Philip H. Sweany
ICML
1997
IEEE
16 years 3 months ago
Predicting Multiprocessor Memory Access Patterns with Learning Models
Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for mem...
M. F. Sakr, Steven P. Levitan, Donald M. Chiarulli...
HPCA
2009
IEEE
16 years 3 months ago
Prediction router: Yet another low latency on-chip router architecture
Network-on-Chips (NoCs) are quite latency sensitive, since their communication latency strongly affects the application performance on recent many-core architectures. To reduce th...
Hiroki Matsutani, Michihiro Koibuchi, Hideharu Ama...
124
Voted
IPPS
2007
IEEE
15 years 8 months ago
Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering
With the increasing gap between processor speed and memory latency, the performance of data-dominated programs are becoming more reliant on fast data access, which can be improved...
Sofiane Naci
153
Voted
IPPS
2009
IEEE
15 years 9 months ago
Singular value decomposition on GPU using CUDA
Linear algebra algorithms are fundamental to many computing applications. Modern GPUs are suited for many general purpose processing tasks and have emerged as inexpensive high per...
Sheetal Lahabar, P. J. Narayanan