Sciweavers

50 search results - page 4 / 10
» Convergence and Divergence in Standard and Averaging Reinfor...
Sort
View
CORR
2007
Springer
73views Education» more  CORR 2007»
13 years 7 months ago
Universal Reinforcement Learning
—We consider an agent interacting with an unmodeled environment. At each time, the agent makes an observation, takes an action, and incurs a cost. Its actions can influence futu...
Vivek F. Farias, Ciamac Cyrus Moallemi, Tsachy Wei...
ICML
2000
IEEE
14 years 8 months ago
Reinforcement Learning in POMDP's via Direct Gradient Ascent
This paper discusses theoretical and experimental aspects of gradient-based approaches to the direct optimization of policy performance in controlled ??? ?s. We introduce ??? ?, a...
Jonathan Baxter, Peter L. Bartlett
ICML
2001
IEEE
14 years 8 months ago
Off-Policy Temporal Difference Learning with Function Approximation
We introduce the first algorithm for off-policy temporal-difference learning that is stable with linear function approximation. Off-policy learning is of interest because it forms...
Doina Precup, Richard S. Sutton, Sanjoy Dasgupta
JMLR
2010
119views more  JMLR 2010»
13 years 2 months ago
A Convergent Online Single Time Scale Actor Critic Algorithm
Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their gen...
Dotan Di Castro, Ron Meir
WSC
2008
13 years 10 months ago
On step sizes, stochastic shortest paths, and survival probabilities in Reinforcement Learning
Reinforcement Learning (RL) is a simulation-based technique useful in solving Markov decision processes if their transition probabilities are not easily obtainable or if the probl...
Abhijit Gosavi