Sciweavers

39 search results - page 3 / 8
» The CDAG: A Data Structure for Automatic Parallelization for...
Sort
View
ICS
2007
Tsinghua U.
14 years 1 months ago
Tradeoff between data-, instruction-, and thread-level parallelism in stream processors
This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and ...
Jung Ho Ahn, Mattan Erez, William J. Dally
ICPP
2008
IEEE
14 years 1 months ago
Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression
Abstract—The Sparse Matrix-Vector Multiplication kernel exhibits limited potential for taking advantage of modern shared memory architectures due to its large memory bandwidth re...
Kornilios Kourtis, Georgios I. Goumas, Nectarios K...
IEEECIT
2010
IEEE
13 years 6 months ago
Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures
Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms a...
Xiao-Long Wu, Nady Obeid, Wen-Mei Hwu
CGF
2010
105views more  CGF 2010»
13 years 7 months ago
Streaming-Enabled Parallel Dataflow Architecture for Multicore Systems
We propose a new framework design for exploiting multi-core architectures in the context of visualization dataflow systems. Recent hardware advancements have greatly increased the...
Huy T. Vo, Daniel K. Osmari, Brian Summa, Jo&atild...
HPCA
2007
IEEE
14 years 7 months ago
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications
Chip multiprocessors with multiple simpler cores are gaining popularity because they have the potential to drive future performance gains without exacerbating the problems of powe...
Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlk...