Markov decision processes (MDPs) are a very popular tool for decision theoretic planning (DTP), partly because of the welldeveloped, expressive theory that includes effective solution techniques. But the Markov assumption--that dynamics and rewards depend on the current state only, and not on history-is often inappropriate. This is especially true of rewards: we frequently wish to associate rewards with behaviors that extend over time. Of course, such reward processes can be encoded in an MDP should we have a rich enough state space (where states encode enough history). However it is often difficult to "hand craft" suitable state spaces that encode an appropriate amount of history. We consider this problem in the case where non-Markovian rewards are encoded by assigning values to formulas of a temporal logic. These formulas characterize the value of temporally extended behaviors. We argue that this allows a natural representation of many commonly encountered non-Markovian re...
Fahiem Bacchus, Craig Boutilier, Adam J. Grove