We consider the problem of solving a nonhomogeneous infinite horizon Markov Decision Process (MDP) problem in the general case of potentially multiple optimal first period policies. More precisely, we seek an algorithm that, given a finite subset of the problem’s potentially infinite data set, delivers an optimal first period policy. Such an algorithm can thus recursively generate, within a rolling horizon procedure, an infinite horizon optimal solution to the original infinite horizon problem. However it can happen that for a given problem no such algorithm exists. In this case, it is impossible to solve the problem with a finite amount of data. We say such problems fail to be wellposed. Under the assumption of increasing marginal returns in actions (with respect to states) and stochastically increasing states into which the system transitions (with respect to actions), we provide an algorithm that is guaranteed to solve the corresponding nonhomogeneous MDP whenever the pro...
Torpong Cheevaprawatdomrong, Irwin E. Schochetman,