— This paper proposes a learning framework for a CPG-based biped locomotion controller using a policy gradient method. Our goal in this study is to develop an efficient learning algorithm by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller in the CPG-based controller can be acquired using the proposed method within a few thousand trials by numerical simulations. Furthermore, we implement the learned controller on the physical biped robot to experimentally show that the learned controller successfully works in the real environment.