With the goal to generate more scalable algorithms with higher efficiency and fewer open parameters, reinforcement learning (RL) has recently moved towards combining classical tec...
Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world proble...
We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results ...
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past, which is an essential problem for physically grounded AI as experiments are us...
Reinforcement learning problems are commonly tackled with temporal difference methods, which use dynamic programming and statistical sampling to estimate the long-term value of ta...