The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of...
In this paper, we present four scheduling algorithms that provide flexible utilization of fine-grain DSP accelerators with low run-time overhead. Methods that have originally been...
Jani Boutellier, Shuvra S. Bhattacharyya, Olli Sil...
Abstract. We describe compiler and run-time optimisations for effective autoparallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by anno...
Current multimedia applications are characterized by highly dynamic and non-deterministic behavior as well as high-performance requirements. In addition, portable devices demand a...
Javier Resano, Diederik Verkest, Daniel Mozos, Ser...
—Three-dimensional network-on-chip (3D NoC), the combination of NoC and die-stacking 3D IC technology, is motivated to achieve lower latency, lower power consumption, and higher ...