Aggressive prefetching is very beneficial for memory latency tolerance of many applications. However, it faces significant challenges in multi-core systems. Prefetchers of diff...
An upgrade from dual-core to quad-core AMD processor on the Cray XT system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) has resulted in significa...
Sadaf R. Alam, Richard F. Barrett, Heike Jagode, J...
This paper presents a novel cycle-approximate performance estimation technique for automatically generated transaction level models (TLMs) for heterogeneous multicore designs. The...
The slow speed of conventional execution-driven architecture simulators is a serious impediment to obtaining desirable research productivity. This paper proposes and evaluates a f...
Emerging microprocessors offer unprecedented parallel computing capabilities and deeper memory hierarchies, increasing the importance of loop transformations in optimizing compile...