Model-free reinforcement learning as mixture learning

15 years 4 months ago

Download user.cs.tu-berlin.de

We cast model-free reinforcement learning as the problem of maximizing the likelihood of a probabilistic mixture model via sampling, addressing both the infinite and finite horizon cases. We describe a Stochastic Approximation EM algorithm for likelihood maximization that, in the tabular case, is equivalent to a non-bootstrapping optimistic policy iteration algorithm like Sarsa(1) that can be applied both in MDPs and POMDPs. On the theoretical side, by relating the proposed stochastic EM algorithm to the family of optimistic policy iteration algorithms, we provide new tools that permit the design and analysis of algorithms in that family. On the practical side, preliminary experiments on a POMDP problem demonstrated encouraging results.

Nikos Vlassis, Marc Toussaint

Real-time Traffic

ICML 2009 | Machine Learning | Optimistic Policy Iteration | Policy Iteration Algorithm | Stochastic Em Algorithm |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2009
Where	ICML
Authors	Nikos Vlassis, Marc Toussaint

Comments (0)

Sciweavers

Model-free reinforcement learning as mixture learning

ICML 2009 | Machine Learning | Optimistic Policy Iteration | Policy Iteration Algorithm | Stochastic Em Algorithm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers