— This paper presents an algorithm for adapting periodic behavior to gradual shifts in task parameters. Since learning optimal control in high dimensional domains is subject to the ’curse of dimensionality’, we parametrize the policy only along the limit cycle traversed by the gait, and thus focus the computational effort on a closed one-dimensional manifold, embedded in the high-dimensional state space. We take an initial gait as a departure point, and iterate between modifying the task slightly, and adapting the gait to this modification. This creates a sequence of gaits, each optimized for a different variant of the task. Since every two gaits in this sequence are very similar, the whole sequence spans a two-dimensional manifold, and combining all policies in this 2-manifold provides additional robustness to the system. We demonstrate our approach on two simulations of bipedal robots — the compass gait walker, which is a four-dimensional system, and RABBIT, which is ten-dim...
Tom Erez, William D. Smart