Sciweavers

16 search results - page 3 / 4
» A Multi-Threaded Streaming Pipeline Architecture for Large S...
Sort
View
HPCA
2009
IEEE
14 years 8 months ago
Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems
Linked data structure (LDS) accesses are critical to the performance of many large scale applications. Techniques have been proposed to prefetch such accesses. Unfortunately, many...
Eiman Ebrahimi, Onur Mutlu, Yale N. Patt
CASES
2006
ACM
13 years 11 months ago
Improving the performance and power efficiency of shared helpers in CMPs
Technology scaling trends have forced designers to consider alternatives to deeply pipelining aggressive cores with large amounts of performance accelerating hardware. One alterna...
Anahita Shayesteh, Glenn Reinman, Norman P. Jouppi...
PARA
1995
Springer
13 years 11 months ago
A Proposal for a Set of Parallel Basic Linear Algebra Subprograms
This paper describes a proposal for a set of Parallel Basic Linear Algebra Subprograms PBLAS. The PBLAS are targeted at distributed vector-vector, matrix-vector and matrixmatrix...
Jaeyoung Choi, Jack Dongarra, Susan Ostrouchov, An...
ISCA
1999
IEEE
110views Hardware» more  ISCA 1999»
13 years 11 months ago
Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor
Providing adequate data bandwidth is extremely important for a wide-issue superscalar processor to achieve its full performance potential. Adding a large number of ports to a data...
Sangyeun Cho, Pen-Chung Yew, Gyungho Lee
EUROGRAPHICS
2010
Eurographics
14 years 3 months ago
Fast Ray Sorting and Breadth-First Packet Traversal for GPU Ray Tracing
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose a standard ray tracing algorithm into several data-parallel stages tha...
Kirill Garanzha and Charles Loop