Loops are an important source of optimization. In this paper, we propose an extension to our work on loop unrolling and loop shifting for reconfigurable architectures. By applying unrolling and shifting to a small loop containing a hardware kernel and some software code, we relocate the function calls contained in the loop body such that in every iteration of the transformed loop, software functions (running on the GPP) execute in parallel with multiple instances of the kernel (running on FPGA). For larger loops containing an arbitrary number of kernels with pieces of code occurring in between the kernels, we study the effects of splitting the loop and applying the unrolling and shifting technique to the small achieved loops.