Smaller feature sizes, reduced voltage levels, higher transistor counts, and reduced noise margins make future generations of microprocessors increasingly prone to transient hardw...
Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best perfo...
In this paper, we consider the tree task graphs which arise from many important programming paradigms such as divide and conquer, branch and bound etc., and the linear task-graphs...
We explore runtime mechanisms and policies for scheduling dynamic multi-grain parallelism on heterogeneous multi-core processors. Heterogeneous multi-core processors integrate con...
Filip Blagojevic, Dimitrios S. Nikolopoulos, Alexa...
In chip multiprocessors (CMPs), data accesslatency dependson the memory hierarchy organization, the on-chip interconnect (NoC), and the running workload. Reducing data access late...