Emerging embedded system applications in multimedia and image processing are characterized by complex control flow consisting of deeply nested conditionals and loops. We present a technique called loop shifting that incrementally exploits loop level parallelism across iterations by shifting and compacting operations across loop iterations. Our experimental results show that loop shifting is particularly effective for the synthesis of designs with complex control especially when resource utilization is already high and/or under tight resource constraints. In situations when further loop unrolling (or initiating another iteration of the loop body) leads to a sharp increase in the longest combinational path in the circuit and the circuit area, loop shifting is able to achieve up to 20 % reduction in the input-to-output delay in the synthesized circuit. We implemented loop shifting within the SPARK parallelizing high-level synthesis framework and present results for experiments on designs...