Sciweavers

473 search results - page 62 / 95
» Optimal policy switching algorithms for reinforcement learni...
Sort
View
JMLR
2012
12 years 15 days ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...
KI
2007
Springer
14 years 4 months ago
Making a Robot Learn to Play Soccer Using Reward and Punishment
In this paper, we show how reinforcement learning can be applied to real robots to achieve optimal robot behavior. As example, we enable an autonomous soccer robot to learn interce...
Heiko Müller, Martin Lauer, Roland Hafner, Sa...
ATAL
2003
Springer
14 years 3 months ago
A selection-mutation model for q-learning in multi-agent systems
Although well understood in the single-agent framework, the use of traditional reinforcement learning (RL) algorithms in multi-agent systems (MAS) is not always justified. The fe...
Karl Tuyls, Katja Verbeeck, Tom Lenaerts
ICML
2009
IEEE
14 years 11 months ago
Predictive representations for policy gradient in POMDPs
We consider the problem of estimating the policy gradient in Partially Observable Markov Decision Processes (POMDPs) with a special class of policies that are based on Predictive ...
Abdeslam Boularias, Brahim Chaib-draa
CIS
2005
Springer
14 years 3 months ago
An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm
Recently, actor-critic methods have drawn much interests in the area of reinforcement learning, and several algorithms have been studied along the line of the actor-critic strategy...
Jooyoung Park, Jongho Kim, Daesung Kang