Sciweavers

EUROSYS
2010
ACM
14 years 3 months ago
Locating cache performance bottlenecks using data profiling
Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers at...
Aleksey Pesterev, Nickolai Zeldovich, Robert T. Mo...
ISSS
1996
IEEE
123views Hardware» more  ISSS 1996»
14 years 3 months ago
Memory Organization for Improved Data Cache Performance in Embedded Processors
Code generation for embedded processors creates opportunities for several performance optimizations not applicable for traditional compilers. We present techniques for improving d...
Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nico...
SC
2000
ACM
14 years 3 months ago
Using Hardware Performance Monitors to Isolate Memory Bottlenecks
In this paper, we present and evaluate two techniques that use different styles of hardware support to provide data structure specific processor cache information. In one approach...
Bryan R. Buck, Jeffrey K. Hollingsworth
ISCA
2000
IEEE
111views Hardware» more  ISCA 2000»
14 years 3 months ago
Understanding the backward slices of performance degrading instructions
For many applications, branch mispredictions and cache misses limit a processor’s performance to a level well below its peak instruction throughput. A small fraction of static i...
Craig B. Zilles, Gurindar S. Sohi
HPCA
2000
IEEE
14 years 3 months ago
Software-Controlled Multithreading Using Informing Memory Operations
Memorylatency isbecominganincreasingly importantperformance bottleneck, especially in multiprocessors. One technique for tolerating memory latency is multithreading, whereby we sw...
Todd C. Mowry, Sherwyn R. Ramkissoon
MICRO
2002
IEEE
164views Hardware» more  MICRO 2002»
14 years 3 months ago
A quantitative framework for automated pre-execution thread selection
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is ineffective. In pre-execution, copies of cache miss computations are isolated fr...
Amir Roth, Gurindar S. Sohi
ISCA
2010
IEEE
232views Hardware» more  ISCA 2010»
14 years 3 months ago
Data marshaling for multi-core architectures
Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to bes...
M. Aater Suleman, Onur Mutlu, José A. Joao,...
ICS
2003
Tsinghua U.
14 years 4 months ago
Enhancing memory level parallelism via recovery-free value prediction
—The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow me...
Huiyang Zhou, Thomas M. Conte
SIGMETRICS
2003
ACM
147views Hardware» more  SIGMETRICS 2003»
14 years 4 months ago
Effect of node size on the performance of cache-conscious B+-trees
In main-memory databases, the number of processor cache misses has a critical impact on the performance of the system. Cacheconscious indices are designed to improve performance b...
Richard A. Hankins, Jignesh M. Patel
IPPS
2003
IEEE
14 years 4 months ago
Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors
While prefetch has proven itself useful for reducing cache misses in multiprocessors, traffic is often increased due to extra unused prefetch data. Prefetching in multiprocessors...
Dan Wallin, Erik Hagersten