: The performance of a statistical machine translation (SMT) system heavily depends on the quantity and quality of the bilingual language resource. However, the pervious work mainly focuses on the quantity and tries to collect more bilingual data. In this paper, we aim to optimize the bilingual corpus to improve the performance of the translation system. We propose methods to process the bilingual language data by filtering noise and selecting more informative sentences from the training corpus and the development corpus. The experimental results show that we can obtain a competitive performance using less data compared with using all available data.