We present a timing optimization algorithm based on the concept of gate duplication on the technologydecomposed network. We first examine the relationship between gate duplication and delay reduction, and then introduce the notion of duplication gain for selecting the good candidate gates to be duplicated. The objective is to obtain the maximum delay reduction with the minimum duplications. The performance of the algorithm is demonstrated with experiments on benchmark circuits. Our approach can also be combined with other technology-independent timing optimizers (such as speed-up) to achieve further delay improvement.