Dynamic programming provides a methodology to develop planners and controllers for nonlinear systems. However, general dynamic programming is computationally intractable. We have developed procedures that allow more complex planning and control problems to be solved. We use second order local trajectory optimization to generate locally optimal plans and local models of the value function and its derivatives. We maintain global consistency of the local models of the value function, guaranteeing that our locally optimal plans are actually globally optimal, up to the resolution of our search procedures. Learning to do the right thing at each instant in situations that evolve over time is di cult, as the future cost of actions chosen now may not be obvious immediately, and may only become clear with time. Value functions are a representational tool that makes the consequences of actions explicit. Value functions are di cult to learn directly, but they can be built up from learned models o...
Christopher G. Atkeson