We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vec...
CMPs enable simultaneous execution of multiple applications on the same platforms that share cache resources. Diversity in the cache access patterns of these simultaneously execut...
Keshavan Varadarajan, S. K. Nandy, Vishal Sharda, ...
Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-order microarchitecture...
Since processor performance scalability will now mostly be achieved through thread-level parallelism, there is a strong incentive to parallelize a broad range of applications, inc...