Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, pr...
Hassan Chafi, Jared Casper, Brian D. Carlstrom, Au...
Caches are organized at a line-size granularity to exploit spatial locality. However, when spatial locality is low, many words in the cache line are not used. Unused words occupy ...
Moinuddin K. Qureshi, M. Aater Suleman, Yale N. Pa...
If-conversion is a compiler technique that reduces the misprediction penalties caused by hard-to-predict branches, transforming control dependencies into data dependencies. Althou...
We apply a scalable approach for practical, comprehensive design space evaluation and optimization. This approach combines design space sampling and statistical inference to ident...
As process technologies continue to scale, the magnitude of within-die device parameter variations is expected to increase and may lead to significant timing variability. This pap...
With the advent of ubiquitous multi-core architectures, a major challenge is to simplify parallel programming. One way to tame one of the main sources of programming complexity, n...
Luis Ceze, Pablo Montesinos, Christoph von Praun, ...
Task-selection policies are critical to the performance of any architecture that uses speculation to extract parallel tasks from a sequential thread. This paper demonstrates that ...
Mayank Agarwal, Kshitiz Malik, Kevin M. Woley, Sam...