We consider reinforcement learning in the parameterized setup, where the model is known to belong to a parameterized family of Markov Decision Processes (MDPs). We further impose ...
Reinforcement learning problems are commonly tackled with temporal difference methods, which attempt to estimate the agent's optimal value function. In most real-world proble...
The scarcity and large fluctuations of link bandwidth in wireless networks have motivated the development of adaptive multimedia services in mobile communication networks, where i...
We introduce new, efficient algorithms for value iteration with multiple reward functions and continuous state. We also give an algorithm for finding the set of all nondominated a...
Daniel J. Lizotte, Michael H. Bowling, Susan A. Mu...
There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most...