This paper presents a hybrid approach to automatic parallelization of computer programs which combines static extraction of threads (tasks) with dynamic scheduling for parallel an...
Ronald Moore, Melanie Klang, Bernd Klauer, Klaus W...
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...
In this paper we describe the design and implementation of a flexible, and extensible, just-in-time ARM simulator designed to run co-operatively with a multi-core DSP simulator on...
Many MPSoC applications are loop-intensive and amenable to automatic parallelization with suitable compiler support. One of the key components of any compiler-parallelized code is...
The growing use of digital signal processors (DSPs) in embedded systems necessitates the use of optimizing compilers supporting special hardware features. Due to the irregular arc...