Sciweavers

492 search results - page 75 / 99
» Predictable performance in SMT processors
Sort
View
ARCS
2008
Springer
13 years 9 months ago
An Optimized ZGEMM Implementation for the Cell BE
: The architecture of the IBM Cell BE processor represents a new approach for designing CPUs. The fast execution of legacy software has to stand back in order to achieve very high ...
Timo Schneider, Torsten Hoefler, Simon Wunderlich,...
AAECC
2007
Springer
111views Algorithms» more  AAECC 2007»
13 years 7 months ago
When cache blocking of sparse matrix vector multiply works and why
Abstract. We present new performance models and a new, more compact data structure for cache blocking when applied to the sparse matrixvector multiply (SpM×V) operation, y ← y +...
Rajesh Nishtala, Richard W. Vuduc, James Demmel, K...
MICRO
2005
IEEE
114views Hardware» more  MICRO 2005»
14 years 1 months ago
Address-Indexed Memory Disambiguation and Store-to-Load Forwarding
This paper describes a scalable, low-complexity alternative to the conventional load/store queue (LSQ) for superscalar processors that execute load and store instructions speculat...
Sam S. Stone, Kevin M. Woley, Matthew I. Frank
ASPLOS
2009
ACM
14 years 8 months ago
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Miss rate curves (MRCs) are useful in a number of contexts. In our research, online L2 cache MRCs enable us to dynamically identify optimal cache sizes when cache-partitioning a s...
David K. Tam, Reza Azimi, Livio Soares, Michael St...
MICRO
2008
IEEE
138views Hardware» more  MICRO 2008»
14 years 1 months ago
Hybrid analytical modeling of pending cache hits, data prefetching, and MSHRs
As the number of transistors integrated on a chip continues to increase, a growing challenge is accurately modeling performance in the early stages of processor design. Analytical...
Xi E. Chen, Tor M. Aamodt