Sciweavers

67 search results - page 4 / 14
» Data transformations enabling loop vectorization on multithr...
Sort
View
ICPP
2008
IEEE
14 years 1 months ago
Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression
Abstract—The Sparse Matrix-Vector Multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth re...
Kornilios Kourtis, Georgios I. Goumas, Nectarios K...
ICPPW
2006
IEEE
14 years 1 months ago
Multidimensional Dataflow-based Parallelization for Multimedia Instruction Set Extensions
In retargeting loop-based code for multimedia instruction set extensions, a critical issue is that vector data types of mixed precision within a loop body complicate the paralleli...
Lewis B. Baumstark Jr., Linda M. Wills
APLAS
2005
ACM
14 years 1 months ago
Transformation to Dynamic Single Assignment Using a Simple Data Flow Analysis
This paper presents a novel method to construct a dynamic single assignment (DSA) form of array-intensive, pointer-free C programs (or in any other procedural language). A program ...
Peter Vanbroekhoven, Gerda Janssens, Maurice Bruyn...
HPCA
2007
IEEE
14 years 7 months ago
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
Chip multiprocessors with multiple simpler cores are gaining popularity because they have the potential to drive future performance gains without exacerbating the problems of powe...
Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlk...
PLDI
1993
ACM
13 years 11 months ago
Global Optimizations for Parallelism and Locality on Scalable Parallel Machines
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
Jennifer-Ann M. Anderson, Monica S. Lam