Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous ...
Sridhar Gopal, T. N. Vijaykumar, James E. Smith, G...
Power consumption is a very important issue for HPC community, both at the level of one application or at the level of whole workload. Load imbalance of a MPI application can be e...
This paper introduces a novel technique which leverages value prediction and multithreading on a simultaneous multithreading processor to achieve higher performance in a single th...
— Driven by the need for higher bandwidth and complexity reduction, off-chip interconnect has evolved from proprietary busses to networked architectures. A similar evolution is o...
Paul Gratz, Changkyu Kim, Robert G. McDonald, Step...
We consider a variety of dynamic, hardware-based methods for exploiting load/store parallelism, including mechanisms that use memory dependence speculation. While previous work ha...