Partitioning and clustering are crucial steps in circuit layout for handling large scale designs enabled by the deep submicron technologies. Retiming is an important sequential logic optimization technique for reducing the clock period by optimally repositioning ip ops 7]. In our exploration of a logical and physical co-design ow, we developed a highly e cient algorithm on combining retiming with circuit partitioning or clustering for clock period minimization. Compared with the recent result by Pan et al. 10] on quasioptimal clustering with retiming, our algorithm is able to reduce both runtime and memory requirement by one order of magnitude without losing quality. Our results show that our algorithm can be over 1000X faster for large designs.