Abstract—The Sparse Matrix-Vector Multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth re...
Kornilios Kourtis, Georgios I. Goumas, Nectarios K...
In retargeting loop-based code for multimedia instruction set extensions, a critical issue is that vector data types of mixed precision within a loop body complicate the paralleli...
This paper presents a novel method to construct a dynamic single assignment (DSA) form of array-intensive, pointer-free C programs (or in any other procedural language). A program ...
Peter Vanbroekhoven, Gerda Janssens, Maurice Bruyn...
Chip multiprocessors with multiple simpler cores are gaining popularity because they have the potential to drive future performance gains without exacerbating the problems of powe...
Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlk...
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...