An agent with limited consumable execution resources needs policies that attempt to achieve good performance while respecting these limitations. Otherwise, an agent (such as a plane) might fail catastrophically (crash) when it runs out of resources (fuel) at the wrong time (in midair). We present a new approach to constructing policies for agents with limited execution resources that builds on principles of real-time AI, as well as research in constrained Markov decision processes. Specifically, we formulate, solve, and analyze the policy optimization problem where constraints are imposed on the probability of exceeding the resource limits. We describe and empirically evaluate our solution technique to show that it is computationally reasonable, and that it generates policies that sacrifice some potential reward in order to make the kinds of precise guarantees about the probability of resource overutilization that are crucial for missioncritical applications.
Dmitri A. Dolgov, Edmund H. Durfee