The efficiency of Peer-to-Peer (P2P) systems is largely dependent on the overlay constructions. Due to the random selection of logical neighbors, there often exists serious topology mismatch problem between overlay and physical topologies in P2P systems. Such mismatching causes unnecessary query message duplications at both the overlay and IP level, as well as an increase in query response time. In this work, we define the optimal overlay problem and prove its NP-hardness. We then propose a distributed overlay optimization algorithm to address this issue and evaluate its effectiveness through trace-driven simulations. The proposed design has four strengths. First, it does not need any global knowledge. Second, its optimization convergent speed is fast. Third, it is orthogonal to other types of advanced search approaches. Fourth, it reduces both the traffic cost and the search latency.