Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers at...
Aleksey Pesterev, Nickolai Zeldovich, Robert T. Mo...
Code generation for embedded processors creates opportunities for several performance optimizations not applicable for traditional compilers. We present techniques for improving d...
Preeti Ranjan Panda, Nikil D. Dutt, Alexandru Nico...
In this paper, we present and evaluate two techniques that use different styles of hardware support to provide data structure specific processor cache information. In one approach...
For many applications, branch mispredictions and cache misses limit a processor’s performance to a level well below its peak instruction throughput. A small fraction of static i...
Memorylatency isbecominganincreasingly importantperformance bottleneck, especially in multiprocessors. One technique for tolerating memory latency is multithreading, whereby we sw...
Pre-execution attacks cache misses for which conventional address-prediction driven prefetching is ineffective. In pre-execution, copies of cache miss computations are isolated fr...
Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to bes...
—The ever-increasing computational power of contemporary microprocessors reduces the execution time spent on arithmetic computations (i.e., the computations not involving slow me...
In main-memory databases, the number of processor cache misses has a critical impact on the performance of the system. Cacheconscious indices are designed to improve performance b...
While prefetch has proven itself useful for reducing cache misses in multiprocessors, traffic is often increased due to extra unused prefetch data. Prefetching in multiprocessors...