Smaller feature sizes, reduced voltage levels, higher transistor counts, and reduced noise margins make future generations of microprocessors increasingly prone to transient hardw...
The load-store unit is a performance critical component of a dynamically-scheduled processor. It is also a complex and non-scalable component. Several recently proposed techniques...
The advent of multicores presents a promising opportunity for speeding up sequential programs via profile-based speculative parallelization of these programs. In this paper we pr...
One of the main challenges of modern processor design is the implementation of a scalable and efficient mechanism to detect memory access order violations as a result of out-of-o...
High-performance processors use a large set–associative L1 data cache with multiple ports. As clock speeds and size increase such a cache consumes a significant percentage of t...
Dan Nicolaescu, Alexander V. Veidenbaum, Alexandru...