Logic duplication is an effective method for improving circuit performance. In this paper we present an algorithm named SPD that performs simultaneous placement and duplication to minimize the longest path delay. We introduce the notion of feasible region and super feasible region to improve the critical path monotonicity from a global perspective. We introduce a constrained gain graph to perform optimal incremental legalization under complex constraints. We also formulate a timing-constrained global redundancy removal problem and propose a heuristic solution. Our SPD algorithm outperforms the state-of-the-art FPGA placement flow (T-VPack + VPR) with an average reduction of up to 27% in longest path estimate delay and 18% in routed delay. The increase in overall runtime is less than 2% and the increase in area is less than 1%. Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids – placement and routing General Terms Algorithms, Design, Performance Keywords Log...