When integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be aligned and integrated, and which regions should be left alone. This problem grows even worse on a modern VLIW DSP due to complicating factors in both the hardware and compiler: software pipelining, predication, branch delay slots, load delay slots and limited resources. As a result, finding an effective integration strategy requires extensive iteration through the integrate/compile/analyze sequence. In this paper we introduce methods to quantitatively estimate the performance benefit from the integration of multiple software threads. We use resource modeling, consider register pressure and compensate for compiler optimizations. This enables different scenarios to be compared and ranked. We then use these estimates to guide integration by concentrating on the most beneficial scenario. Information from each it...
Won So, Alexander G. Dean