Sciweavers

68 search results - page 6 / 14
» Feature-Discovering Approximate Value Iteration Methods
Sort
View
ATAL
2008
Springer
13 years 10 months ago
Sigma point policy iteration
In reinforcement learning, least-squares temporal difference methods (e.g., LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear v...
Michael H. Bowling, Alborz Geramifard, David Winga...
CDC
2010
IEEE
139views Control Systems» more  CDC 2010»
13 years 2 months ago
Q-learning and enhanced policy iteration in discounted dynamic programming
We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal state costs or Q-facto...
Dimitri P. Bertsekas, Huizhen Yu
ICC
2009
IEEE
145views Communications» more  ICC 2009»
13 years 5 months ago
End-to-End Delay Approximation in Cascades of Generalized Processor Sharing Schedulers
Abstract This paper proposes an analytical method to evaluate the delay violation probability of traffic flows with statistical Quality-of-Service (QoS) guarantees in a Generalize...
Paolo Giacomazzi, Gabriella Saddemi
ICML
2006
IEEE
14 years 1 months ago
Automatic basis function construction for approximate dynamic programming and reinforcement learning
We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results ...
Philipp W. Keller, Shie Mannor, Doina Precup
ECML
2004
Springer
14 years 1 months ago
Convergence and Divergence in Standard and Averaging Reinforcement Learning
Although tabular reinforcement learning (RL) methods have been proved to converge to an optimal policy, the combination of particular conventional reinforcement learning techniques...
Marco Wiering