Sciweavers

214 search results - page 33 / 43
» Automatic parallelization for graphics processing units
Sort
View
MICRO
2010
IEEE
153views Hardware» more  MICRO 2010»
13 years 5 months ago
Throughput-Effective On-Chip Networks for Manycore Accelerators
As the number of cores and threads in manycore compute accelerators such as Graphics Processing Units (GPU) increases, so does the importance of on-chip interconnection network des...
Ali Bakhoda, John Kim, Tor M. Aamodt
CCGRID
2005
IEEE
14 years 1 months ago
OGSA-based grid workload monitoring
In heterogeneous and dynamic distributed systems like the Grid, detailed monitoring of workload and its resulting system performance (e.g. response time) is required to facilitate...
Rui Zhang, Steve Moyle, Steve McKeever, Stephen He...
ASPLOS
2010
ACM
14 years 2 months ago
COMPASS: a programmable data prefetcher using idle GPU shaders
A traditional fixed-function graphics accelerator has evolved into a programmable general-purpose graphics processing unit over the last few years. These powerful computing cores...
Dong Hyuk Woo, Hsien-Hsin S. Lee
PPOPP
2010
ACM
14 years 4 months ago
Model-driven autotuning of sparse matrix-vector multiply on GPUs
We present a performance model-driven framework for automated performance tuning (autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics processing...
Jee W. Choi, Amik Singh, Richard W. Vuduc
IPPS
2010
IEEE
13 years 5 months ago
Dynamic load balancing on single- and multi-GPU systems
The computational power provided by many-core graphics processing units (GPUs) has been exploited in many applications. The programming techniques currently employed on these GPUs...
Long Chen, Oreste Villa, Sriram Krishnamoorthy, Gu...