We present a technique for computing approximately optimal solutions to stochastic resource allocation problems modeled as Markov decision processes (MDPs). We exploit two key pro...
Nicolas Meuleau, Milos Hauskrecht, Kee-Eung Kim, L...
Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring O(|S|3 ) to directly solve the Bellman system of |S...
The success of probabilistic model checking for discrete-time Markov decision processes and continuous-time Markov chains has led to rich academic and industrial applications. The ...
In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of ...
Takamitsu Matsubara, Tetsuro Morimura, Jun Morimot...