Dependencies between iterations of loop structures cannot always be determined at compile-time because they may depend on input data which is known only at run-time. A prime example is a loop accessing an array where the array indices are themselves functions of another array determined only at run-time. To parallelize such loops, it is necessary to perform a run-time analysis. We describe a new algorithm to perform this analysis. The proposed method handles all types of data dependencies without requiring any special architectural support in the multiprocessor. Our scheme has an inspector which builds the iteration schedule and an executor which uses the schedule to execute the various iterations. This approach does not require any special synchronization operations during the inspector stage and the executor can be implemented with or without synchronization support. It allows overlap among dependent iterations and requires very little inter-processor communication. Furthermore, the...
V. Prasad Krothapalli, Thulasiraman Jeyaraman, Mar