Multiple clock cycles are needed to cross the global interconnects for multi-gigahertz designs in nanometer technologies. For synchronous designs, this requires retiming and pipelining on global interconnects. In this paper, we present a practical solution for simultaneous retiming and multilevel global placement for performance optimization, based on the theory and algorithms of sequential timing analysis (Seq-TA). We extend the Seq-TA to handle gates/clusters with multiple outputs and integrate it into a multilevel optimization framework for simultaneous retiming and placement. We also develop two speed-up techniques which enable the Seq-TA to be efficiently integrated into a simulated annealing-based multilevel coarse placement for large-scale designs. Experimental results show that (i) retiming can improve the performance (delay) by 14% on average when it is applied after placement; (ii) our approach for simultaneous retiming and placement can outperform the two-step approach (pla...