Sciweavers

128 search results - page 20 / 26
» Hierarchically Optimal Average Reward Reinforcement Learning
Sort
View
COLT
2007
Springer
14 years 1 months ago
Strategies for Prediction Under Imperfect Monitoring
Abstract. We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but r...
Gábor Lugosi, Shie Mannor, Gilles Stoltz
AGENTS
1999
Springer
13 years 12 months ago
General Principles of Learning-Based Multi-Agent Systems
We consider the problem of how to design large decentralized multiagent systems (MAS’s) in an automated fashion, with little or no hand-tuning. Our approach has each agent run a...
David Wolpert, Kevin R. Wheeler, Kagan Tumer
SIAMCO
2000
117views more  SIAMCO 2000»
13 years 7 months ago
The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergen...
Vivek S. Borkar, Sean P. Meyn
GECCO
2005
Springer
155views Optimization» more  GECCO 2005»
14 years 1 months ago
Co-evolving recurrent neurons learn deep memory POMDPs
Recurrent neural networks are theoretically capable of learning complex temporal sequences, but training them through gradient-descent is too slow and unstable for practical use i...
Faustino J. Gomez, Jürgen Schmidhuber
ATAL
2010
Springer
13 years 7 months ago
PAC-MDP learning with knowledge-based admissible models
PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way which guarantees that with high probability, the algorithm pe...
Marek Grzes, Daniel Kudenko