Abstract— We consider the task of omnidirectional path following for a quadruped robot: moving a four-legged robot along any arbitrary path while turning in any arbitrary manner. Learning a controller capable of such motion requires learning the parameters of a very high-dimensional policy class, which requires a prohibitively large amount of data to be collected on the real robot. Although learning such a policy can be much easier in a model (or “simulator”) of the system, it can be extremely difficult to build a sufficiently accurate simulator. In this paper we propose a method that uses a (possibly inaccurate) simulator to identify a low-dimensional subspace of policies that is robust to variations in model dynamics. Because this policy class is low-dimensional, we can learn an instance from this class on the real system using much less data than would be required to learn a policy in the original class. In our approach, we sample several models from a distribution over the ...
J. Zico Kolter, Andrew Y. Ng