Sciweavers

65 search results - page 13 / 13
» Reducing traffic generated by conflict misses in caches
Sort
View
JSA
2000
116views more  JSA 2000»
13 years 7 months ago
Distributed vector architectures
Integrating processors and main memory is a promising approach to increase system performance. Such integration provides very high memory bandwidth that can be exploited efficientl...
Stefanos Kaxiras
IEEEPACT
2003
IEEE
14 years 22 days ago
Picking Statistically Valid and Early Simulation Points
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months to complete. ...
Erez Perelman, Greg Hamerly, Brad Calder
CASES
2010
ACM
13 years 5 months ago
Fine-grain dynamic instruction placement for L0 scratch-pad memory
We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (spms), whose unit of transfer can be an individual instruction. Our algorithm ca...
JongSoo Park, James D. Balfour, William J. Dally
IJHPCA
2010
84views more  IJHPCA 2010»
13 years 6 months ago
Operation Stacking for Ensemble Computations With Variable Convergence
Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, indexbased matrix representations, which degrade cache utilization by i...
Mehmet Belgin, Godmar Back, Calvin J. Ribbens
ISCA
2005
IEEE
98views Hardware» more  ISCA 2005»
14 years 1 months ago
Techniques for Efficient Processing in Runahead Execution Engines
Runahead execution is a technique that improves processor performance by pre-executing the running application instead of stalling the processor when a long-latency cache miss occ...
Onur Mutlu, Hyesoon Kim, Yale N. Patt