Sciweavers

10 search results - page 2 / 2
» Near-optimal Regret Bounds for Reinforcement Learning
Sort
View
ICML
2006
IEEE
14 years 9 months ago
PAC model-free reinforcement learning
For a Markov Decision Process with finite state (size S) and action spaces (size A per state), we propose a new algorithm--Delayed Q-Learning. We prove it is PAC, achieving near o...
Alexander L. Strehl, Lihong Li, Eric Wiewiora, Joh...
ML
2002
ACM
133views Machine Learning» more  ML 2002»
13 years 8 months ago
Finite-time Analysis of the Multiarmed Bandit Problem
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while t...
Peter Auer, Nicolò Cesa-Bianchi, Paul Fisch...
ATAL
2006
Springer
14 years 8 days ago
A hierarchical approach to efficient reinforcement learning in deterministic domains
Factored representations, model-based learning, and hierarchies are well-studied techniques for improving the learning efficiency of reinforcement-learning algorithms in large-sca...
Carlos Diuk, Alexander L. Strehl, Michael L. Littm...
JMLR
2012
11 years 11 months ago
Contextual Bandit Learning with Predictable Rewards
Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on th...
Alekh Agarwal, Miroslav Dudík, Satyen Kale,...
CIMCA
2008
IEEE
14 years 3 months ago
Tree Exploration for Bayesian RL Exploration
Research in reinforcement learning has produced algorithms for optimal decision making under uncertainty that fall within two main types. The first employs a Bayesian framework, ...
Christos Dimitrakakis