—Within the scope of the multithreaded dataflow, the problem of scheduling/allocation of DOACROSS loops has been discussed and it was shown that the so-called staggered allocation offers higher performance and resource utilization than other schemes described in the literature. The staggered scheme, however, produces an unbalanced load among processors. This paper introduces an extension to the staggered scheme—cyclic staggered scheme—that produces a more balanced distribution of iterations among processors. The cyclic staggered scheme is simulated and its performance improvement is analyzed.
Ali R. Hurson, Krishna M. Kavi, Joford T. Lim