The majority of the work in the area of Markov decision processes has focused on expected values of rewards in the objective function and expected costs in the constraints. Although several methods have been proposed to model risksensitive utility functions and constraints, they are only applicable to certain classes of utility functions and allow limited expressiveness in the constraints. We propose a construction that extends the standard linear programming formulation of MDPs by augmenting it with additional optimization variables, which allows us to compute the higher order moments of the total costs (and/or reward). This greatly increases the expressive power of the model, and supports reasoning about the probability distributions of the total costs (reward). Consequently, this allows us to formulate more interesting constraints and to model a wide range of utility functions. In particular, in this work we show how to formulate the constraint that bounds the probability of the to...
Dmitri A. Dolgov, Edmund H. Durfee