Policy Tree: Adaptive Representation for Policy Gradient

10 years 4 months ago

Download www.fandm.edu

Much of the focus on ﬁnding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Policy gradient algorithms, which directly represent the policy, often need fewer parameters to learn good policies. However, they typically employ a ﬁxed parametric representation that may not be sufﬁcient for complex domains. This paper introduces the Policy Tree algorithm, which can learn an adaptive representation of policy in the form of a decision tree over different instantiations of a base policy. Policy gradient is used both to optimize the parameters and to grow the tree by choosing splits that enable the maximum local increase in the expected return of the policy. Experiments show that this algorithm can choose genuinely helpful splits and signiﬁcantly improve upon the commonly used linear Gibbs softmax policy, which we choose as our base policy.

Ujjwal Das Gupta, Erik Talvitie, Michael Bowling

Real-time Traffic

AAAI 2015 | Intelligent Agents |

claim paper

Post Info
More Details (n/a)

Added	27 Mar 2016
Updated	27 Mar 2016
Type	Journal
Year	2015
Where	AAAI
Authors	Ujjwal Das Gupta, Erik Talvitie, Michael Bowling

Comments (0)

Sciweavers

Policy Tree: Adaptive Representation for Policy Gradient

AAAI 2015 | Intelligent Agents |

Explore & Download

Productivity Tools

Sciweavers