Sciweavers

656 search results - page 34 / 132
» Scalable Parallel Matrix Multiplication on Distributed Memor...
Sort
View
HPCA
2007
IEEE
14 years 8 months ago
A Memory-Level Parallelism Aware Fetch Policy for SMT Processors
A thread executing on a simultaneous multithreading (SMT) processor that experiences a long-latency load will eventually stall while holding execution resources. Existing long-lat...
Stijn Eyerman, Lieven Eeckhout
WMPI
2004
ACM
14 years 1 months ago
Scalable cache memory design for large-scale SMT architectures
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
Muhamed F. Mudawar
ERSA
2007
194views Hardware» more  ERSA 2007»
13 years 9 months ago
A Scalable and Reconfigurable Shared-Memory Graphics Cluster Architecture
Abstract: If the computational demands of an interactive graphics rendering application cannot be met by a single commodity Graphics Processing Unit (GPU), multiple graphics accele...
Ross Brennan, Michael Manzke, Keith O'Conor, John ...
PPSC
1993
13 years 9 months ago
I/O for TFLOPS Supercomputers
Scalable parallel computers with TFLOPS (Trillion FLoating Point Operations Per Second) performance levels are now under construction. While we believe TFLOPS processor technology...
Erik DeBenedictis, Stephen C. Johnson
HIPC
2009
Springer
13 years 5 months ago
A performance prediction model for the CUDA GPGPU platform
The significant growth in computational power of modern Graphics Processing Units(GPUs) coupled with the advent of general purpose programming environments like NVIDA's CUDA,...
Kishore Kothapalli, Rishabh Mukherjee, M. Suhail R...