A class of biped locomotion called Passive Dynamic Walking (PDW) has been recognized to be efficient in energy consumption and a key to understand human walking. Although PDW is sensitive to the initial condition and disturbances, studies of Quasi-PDW which incorporates supplemental actuators have been reported to overcome this sensitivity. In this article, we propose a reinforcement learning method designed particularly for Quasi-PDW of a biped robot whose possession of knees makes the system unstable. Simulations show that the learning is quickly accomplished after 1000 episodes, and the obtained controller is robust against variations in the slope gradient and sudden perturbations. c 2006 Elsevier B.V. All rights reserved.