Sciweavers

NN
2010
Springer
187views Neural Networks» more  NN 2010»
13 years 6 months ago
Efficient exploration through active learning for value function approximation in reinforcement learning
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares ...
Takayuki Akiyama, Hirotaka Hachiya, Masashi Sugiya...
CDC
2010
IEEE
139views Control Systems» more  CDC 2010»
13 years 6 months ago
Q-learning and enhanced policy iteration in discounted dynamic programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...
Dimitri P. Bertsekas, Huizhen Yu
CDC
2010
IEEE
136views Control Systems» more  CDC 2010»
13 years 6 months ago
Pathologies of temporal difference methods in approximate dynamic programming
Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated conve...
Dimitri P. Bertsekas
AUTOMATICA
2008
74views more  AUTOMATICA 2008»
13 years 11 months ago
Policy iteration based feedback control
It is well known that stochastic control systems can be viewed as Markov decision processes (MDPs) with continuous state spaces. In this paper, we propose to apply the policy iter...
Kan-Jian Zhang, Yan-Kai Xu, Xi Chen, Xi-Ren Cao
NIPS
2003
14 years 24 days ago
Bounded Finite State Controllers
We describe a new approximation algorithm for solving partially observable MDPs. Our bounded policy iteration approach searches through the space of bounded-size, stochastic fini...
Pascal Poupart, Craig Boutilier
NIPS
2001
14 years 24 days ago
Model-Free Least-Squares Policy Iteration
We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. ...
Michail G. Lagoudakis, Ronald Parr
AAAI
2006
14 years 25 days ago
Incremental Least Squares Policy Iteration for POMDPs
We present a new algorithm, called incremental least squares policy iteration (ILSPI), for finding the infinite-horizon stationary policy for partially observable Markov decision ...
Hui Li, Xuejun Liao, Lawrence Carin
UAI
2008
14 years 26 days ago
Sparse Stochastic Finite-State Controllers for POMDPs
Bounded policy iteration is an approach to solving infinitehorizon POMDPs that represents policies as stochastic finitestate controllers and iteratively improves a controller by a...
Eric A. Hansen
AAAI
2007
14 years 1 months ago
Point-Based Policy Iteration
We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen’s policy iteration with point...
Shihao Ji, Ronald Parr, Hui Li, Xuejun Liao, Lawre...
VALUETOOLS
2006
ACM
176views Hardware» more  VALUETOOLS 2006»
14 years 5 months ago
How to solve large scale deterministic games with mean payoff by policy iteration
Min-max functions are dynamic programming operators of zero-sum deterministic games with finite state and action spaces. The problem of computing the linear growth rate of the or...
Vishesh Dhingra, Stephane Gaubert