Sciweavers

85 search results - page 11 / 17
» Approximate Policy Iteration with a Policy Language Bias
Sort
View
CORR
2012
Springer
235views Education» more  CORR 2012»
12 years 3 months ago
An Incremental Sampling-based Algorithm for Stochastic Optimal Control
Abstract— In this paper, we consider a class of continuoustime, continuous-space stochastic optimal control problems. Building upon recent advances in Markov chain approximation ...
Vu Anh Huynh, Sertac Karaman, Emilio Frazzoli
CORR
2010
Springer
170views Education» more  CORR 2010»
13 years 7 months ago
Global Optimization for Value Function Approximation
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose a new approximate bili...
Marek Petrik, Shlomo Zilberstein
IAT
2008
IEEE
14 years 1 months ago
Introducing Communication in Dis-POMDPs with Locality of Interaction
The Networked Distributed POMDPs (ND-POMDPs) can model multiagent systems in uncertain domains and has begun to scale-up the number of agents. However, prior work in ND-POMDPs has ...
Makoto Tasaki, Yuichi Yabu, Yuki Iwanari, Makoto Y...
ICML
1999
IEEE
14 years 8 months ago
Least-Squares Temporal Difference Learning
Excerpted from: Boyan, Justin. Learning Evaluation Functions for Global Optimization. Ph.D. thesis, Carnegie Mellon University, August 1998. (Available as Technical Report CMU-CS-...
Justin A. Boyan
NIPS
2007
13 years 8 months ago
Incremental Natural Actor-Critic Algorithms
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning m...
Shalabh Bhatnagar, Richard S. Sutton, Mohammad Gha...