Policy gradient methods for reinforcement learning avoid some of the undesirable properties of the value function approaches, such as policy degradation (Baxter and Bartlett, 2001...
Evan Greensmith, Peter L. Bartlett, Jonathan Baxte...
The resource constraint project scheduling problem (RCPSP) is an NP-hard benchmark problem in scheduling which takes into account the limitation of resources’ availabilities in ...
We present a dynamic programming approach for the solution of first-order Markov decisions processes. This technique uses an MDP whose dynamics is represented in a variant of the ...
Several reinforcement-learning techniques have already been applied to the Acrobot control problem, using linear function approximators to estimate the value function. In this pape...
Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world proble...
We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using lo...
In many practical reinforcement learning problems, the state space is too large to permit an exact representation of the value function, much less the time required to compute it. ...
Recently developed dual techniques allow us to evaluate a given sub-optimal dynamic portfolio policy by using the policy to construct an upper bound on the optimal value function....
Partially Observable Markov Decision Processes have been studied widely as a model for decision making under uncertainty, and a number of methods have been developed to find the s...
This paper introduces the even-odd POMDP, an approximation to POMDPs in which the world is assumed to be fully observable every other time step. The even-odd POMDP can be converte...