Sciweavers

1512 search results - page 220 / 303
» Qualitative reinforcement learning
Sort
View
CORR
2010
Springer
124views Education» more  CORR 2010»
13 years 9 months ago
Mimicking the Behaviour of Idiotypic AIS Robot Controllers Using Probabilistic Systems
Previous work has shown that robot navigation systems that employ an architecture based upon the idiotypic network theory of the immune system have an advantage over control techn...
Amanda M. Whitbrook, Uwe Aickelin, Jonathan M. Gar...
CORR
2010
Springer
126views Education» more  CORR 2010»
13 years 9 months ago
The Use of Probabilistic Systems to Mimic the Behaviour of Idiotypic AIS Robot Controllers
Previous work has shown that robot navigation systems that employ an architecture based upon the idiotypic network theory of the immune system have an advantage over control techn...
Amanda M. Whitbrook, Uwe Aickelin, Jonathan M. Gar...
ICML
2009
IEEE
14 years 10 months ago
Monte-Carlo simulation balancing
In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation polic...
David Silver, Gerald Tesauro
ICML
2001
IEEE
14 years 10 months ago
Direct Policy Search using Paired Statistical Tests
Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maxi...
Malcolm J. A. Strens, Andrew W. Moore
ECML
2007
Springer
14 years 3 months ago
Policy Gradient Critics
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...
Daan Wierstra, Jürgen Schmidhuber