Search Sciweavers | Sciweavers

1512 search results - page 220 / 303

» Qualitative reinforcement learning

135

click to vote

CORR
2010
Springer

124views Education» more CORR 2010»

Mimicking the Behaviour of Idiotypic AIS Robot Controllers Using Probabilistic Systems

15 years 5 months ago

Download ima.ac.uk

Previous work has shown that robot navigation systems that employ an architecture based upon the idiotypic network theory of the immune system have an advantage over control techn...

Amanda M. Whitbrook, Uwe Aickelin, Jonathan M. Gar...

claim paper

Read More »

158

Voted

CORR
2010
Springer

126views Education» more CORR 2010»

The Use of Probabilistic Systems to Mimic the Behaviour of Idiotypic AIS Robot Controllers

15 years 5 months ago

Download ima.ac.uk

Previous work has shown that robot navigation systems that employ an architecture based upon the idiotypic network theory of the immune system have an advantage over control techn...

Amanda M. Whitbrook, Uwe Aickelin, Jonathan M. Gar...

claim paper

Read More »

142

click to vote

ICML
2009
IEEE

131views Machine Learning» more ICML 2009»

Monte-Carlo simulation balancing

16 years 6 months ago

Download www.cs.ualberta.ca

In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation polic...

David Silver, Gerald Tesauro

claim paper

Read More »

189

click to vote

ICML
2001
IEEE

159views Machine Learning» more ICML 2001»

Direct Policy Search using Paired Statistical Tests

16 years 6 months ago

Download www.autonlab.org

Direct policy search is a practical way to solve reinforcement learning problems involving continuous state and action spaces. The goal becomes finding policy parameters that maxi...

Malcolm J. A. Strens, Andrew W. Moore

claim paper

Read More »

158

click to vote

ECML
2007
Springer

192views Machine Learning» more ECML 2007»

Policy Gradient Critics

15 years 12 months ago

Download www.idsia.ch

We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policies for Partially Observable Markov ...

Daan Wierstra, Jürgen Schmidhuber

claim paper

Read More »

« Prev « First page 220 / 303 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers