We present a scheme to compress branch trace information for use in snapshot-based microarchitecture simulation. The compressed trace can be used to warm any arbitrary branch pred...
The present work presents a cycle-level execution-driven simulator for modern GPU architectures. We discuss the simulation model used for our GPU simulator, based in the concept o...
Despite years of study, branch mispredictions remain as a significant performance impediment in pipelined superscalar processors. In general, the branch misprediction penalty can...
Stride-based prefetching mechanisms exploit regular streams of memory accesses to hide memory latency. While these mechanisms are effective, they can be improved by studying the p...
Detailed microarchitectural simulators are not well suited for exploring large design spaces due to their excessive simulation times. We introduce AXCIS, a framework for fast and ...
The performance of computer systems depends, among other things, on the workload. Performance evaluations are therefore often done using logs of workloads on current productions s...
Commercial processors have support for Simultaneous Multithreading (SMT), yet little work has been done to provide representative simulation results for SMT. Given a workload, cur...
Michael Van Biesbrouck, Lieven Eeckhout, Brad Cald...
Current simulation-sampling techniques construct accurate model state for each measurement by continuously warming large microarchitectural structures (e.g., caches and the branch...
Thomas F. Wenisch, Roland E. Wunderlich, Babak Fal...