We present a dynamic programming approach for the solution of first-order Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the ...
This paper presents an algorithm for robust optimal control of regular languages under specified uncertainty bounds on the event cost parameters of the language measure that has b...
Reinforcement learning induces non-stationarity at several levels. Adaptation to non-stationary environments is of course a desired feature of a fair RL algorithm. Yet, even if the...
— We describe a simple and intuitive policy gradient method for improving parametrized quadrocopter multi-flips by combining iterative experiments with information from a first...
In this paper, we use the rollout method for policy improvement to analyze a version of Klondike solitaire. This version, sometimes called thoughtful solitaire, has all cards reve...